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EVALUATION 


Thia  report  assesses  the  feaalbillty  of  u8ing  the  currently  exl8tlng 
Representation  Independent  Program  System  as  a means  of  making  known  data 
available  as  a resource  to  a very  large  data  base  query  system.  It  has 
direct  application  wherever  -large  data  aggregates  are  accessed  by  computer. 
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INTRODUCTION 

Purpose 

The  purpose  of  this  document  is  to  establish  the  feasibility  of  the  Rep- 
resentation-Independent Programming  System  (RIPS)  as  an  implementation  ve- 
hicle for  a Knowledge  Management  (KM)  test  bed.  KM  requirements1’2  are  ana- 
lyzed and  allocated  to  existing  RIPS  components,  forming  a preliminary  func- 
tional allocation.  Extensions  of  both  KM  concepts  and  RIPS  capabilities  are 
then  determined,  resulting  in.  a final  correlation  matrix  describing  RIPS 
functional  specifications. 

The  current  status  of  RIPS  is  then  discussed,  and  a development  plan  is 
proposed  for  completion  of  a prototype  system  that  includes  estimates  of 
facilities  and  manpower  as  well  as  projected  estimates  of  performance  char- 
acteristics. Finally,  our  conclusions  are  presented. 

Scope 

The  next  section  describes  RIPS  components  in  the  context  of  an  existing 
system.  In  reality,  RIPS  is  a collection  of  concepts,  methodologies,  and 
software,  still  in  the  research  phase.  The  descriptions  are  brief,  refer- 
encing existing  documentation  and  published  papers.  Some  components  are  in 
the  form  of  technical  notes  or  papers  in  progress,  and  it  is  impractical  to 
include  them.  Neither  is  it  practical  to  develop  formal  documentation  or 
equivalent  details  for  the  current  effort.  In  these  cases,  only  pertinent 
features  and  their  underlying  concepts  are  described. 

The  fourth  section  lists  KM  requirements  taken  from  Section  2 of  Refer- 
ence 1 and  Appendix  A.l  of  Reference  2.  Each  requirement  is  a brief  synop- 
sis, sequenced  in  order  of  appearance,  accompanied  by  the  KM  page  reference 
and  corresponding  references  to  discussions  of  its  allocation  to  RIPS  com- 
ponents in  later  sections  of  this  document. 

The  fifth  section  contains  a preliminary  allocation  of  KM  functions  to 
RIPS  components.  It  is  organized  on  the  basis  of  the  KM  Logical  System  De- 
•sign.  KM  requirements  specified  in  this  section  are  discussed  in  conjunc- 
tion with  those  of  the  previous  section. 

The  sixth  and  seventh  sections  discuss  extensions  to  RIPS  required  by 
KM  functional  requirements,  and  extensions  to  KM  concepts  made  possible 
through  RIPS,  respectively. 


1.  James  F.  Berry  and  Craig  M.  Cook:  Managing  Knowledge  as  a Corporate 

Resource.  Contract  source  document,  Version  4.5,  28  May  1976. 

/ 

2.  James  F.  Berry  and  Craig  M.  Cook:  Viewing  Knowledge  as  a Resource  in 
Federal  Departments  of  the  U.S.  Government.  Economic  Research  Service, 
U.S.  Department  of  Agriculture,  September  1977. 


The  eigth  section  presents  the  correlation  of  KM  functions  with  RIPS 
components,  including  extensions  from  the  fifth  and  sixth  sections,  in  an 
allocation  matrix.  The  result  is  a final  functional  assignment,  realizing 
the  KM  concept  through  RIPS. 

The  ninth  section  discusses  the  current  status  of  RIPS  components.  This 
is  an  important  section  because  RIPS  is  a research  project  and,  even  though 
the  basic  concept  appears  sdund,  there  is  always  some  risk  in  the  transition 
from  research  to  development.  The  technology  needed  to  complete  the  RIPS 
functional  design  to  the  degree  necessary  for  development  is  discussed  item 
by  item  and  includes  the  effect  on  other  RIPS  components. 

The  tenth  section  contains  a work  plan  for  development  of  a KM  test  bed 
or  prototype  RIPS.  Rough  order-of-magnitude  (ROM)  estimates  of  hardware  and 
software  facilities  for  the  test  bed  are  presented,  along  with  estimates  of 
expected  performance  based  on  empirical  results  of  current  software. 

The  last  section  presents  our  conclusions. 


REPRESENTATION- INDEPENDENT  PROGRAMMING  SYSTEM  COMPONENTS 

General 

The  RIPS  consists  of  concepts,  methodologies,  and  software  designed  to 
provide  solutions  to  many  problems  existing  in  current  database  systems  and 
Management  Information  Systems  (MISs)  implementations  and  technology.  In- 
dividual components  of  RIPS  have  been  developed  over  a period  of  several 
years  by  the  Martin  Marietta  Database  Research  Project.  Including  the  re- 
sults of  other  researchers — both  academic  and  industrial — incorporated  in 
the  RIPS  where  practical,  current  progress  is  the  result  of  more  than  100 
man-years  of  research  and  development. 

Each  RIPS  component  has  been  directed  toward  a specific  problem  or  class 
of  problems,  with  the  underlying  philosophy  that  mutual  compatibility  is 
ensured.  There  is  not  a single  document  describing  RIPS.  Rather,  tlw^re 
are  several  documents  and  papers  that  describe  individual  concepts  and  soft- 
ware. Also,  some  of  the  more  recently  developed  concepts  are  still  in  the 
form  of  working  papaers. 

The  section  summarizes  the  RIPS  components.  References  to  published 
papers  or  other  documents  are  provided  when  they  exist.  In  the  following 
paragraph,  we  present  a chronology  of  Database  Research  Project  accomplish- 
ments to  provide  an  overview  of  RIPS  evolution  and  to  emphasize  the  research 
nature  of  the  project. 

Background  - The  project  was  formed  in  early  1974  to  perform  research  on 
the  selection  process  of  Generalized  Database  Management  Systems  (GDBMS)  and 
to  develop  a prototype  simulator  to  evaluate  thfe  performance  of  candidate 
GDBMSs.  An  essential  element  of  the  simulator  is  a representation-indepen- 
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dent  statement  of  the  work  load  or  traffic  of  the  application  under  study. 
This  was  satisfied  by  first  stating  all  data  requirements  in  terms  of  a can- 
onic representation-independent  data  model,  and  second  by  quantifying  all 
application  functions  in  terms 'of  users  and  data  sources.  The  canonic  model 
chosen  was  the  Entity  Set  Model,3  and  quantification  of  application  func- 
tions was  developed  and  is  referred  to  as  Quantitative  Data  Description 
(QDD).4 

Another  essential  element  cf  the  simulator  is  a model  of  the  candidate 
GDBMSs  to  implement  the  application  under  study.  The  Data-Independent  Ac- 
cessing Model  (DIAM  I)3*5  was  chosen  because  of  its  completeness  in  describ- 
ing the  various  levels  of  data  storage  and  accessing  techniques  in  a single, 
standardized  model.  The  DIAM  describes  the  implementation  of  candidate 
DBMSs  in  terms  of  the  Entity  Set  Model,  which  was  the  major  reason  for  se- 
lecting the  Entity  Set  Model  as  the  data  model  for  the  QDD. 

The  remaining  element  of  the  simulator  is  a host-computer  model  de- 
scribing the  performance  of  candidate  computer  systems  in  processing  dis- 
crete-event application  functions  generated  by  the  simulator  as  described 
by  the  QDD. 


The  results  of  this  segment  of  research  and  development  are  in  Refer- 
ences 6,  7,  8,  and  9. 


In  1975,  the  project's  scope  was  expanded  to  Investigate  the  use  of  the 
same  concepts  employed  in  developing  the  simulator  for  distributed  hetero- 
geneous database  systems.  Hie  objective  was  to  allow  users  at  one  node  in 


3. 


5. 


6. 


7. 


g. 


q. 


M.  E.  Senko  et  al.:  "Data  Structures  and  Accessing  in  Database  Systems," 
I By  Fys  tens  Foitmia  l , No.  1,  1^73,  pp  30-93. 


L.  S. 

'.'roe. 
.lose , 


Schneider  and  C.  R.  Spath 

■;  'M  SIGMOl  International 

California,  May  1975,  pp 


: "Quantitative  Data  Description 

Conference  on  Management  of  Data, 

167-195  (ed.  W.  F.‘  King) . 


H 

San 


M.  E.  Senko  et  al . : A Diita- Independent  Awhitecture  Model  1:  Four  Levels 


■ lical  Ft  rue  tunes  to  Phusica:  Search  Stmtetures. 


IBM  Research  Report  RF  9S2,  February  1072. 

L.  S.  Schneider  and  T.  W.  Connolly:  "Generalized  Data  Base  Management 
System  Simulator,"  Proa.  1916  Winter  F in:, la ti on  Conference,  Vol  2, 

December  1976  (ed.  H.  J.  Highland,  et  al.). 


Martin  Marietta  Database 
Funs  t iona  l Free  if ica  t i 

Contract  Documentation, 
September  1975. 


Research  Project:  GIM.F  Mat';  Mode 
, Design  Specification  and  Peer's 

NAS9-13951,  Johnson  Space  Center, 


l Final  a 
Guide . 

Houston, 


tor, 

NASA 
Texas , 


Martin  Marietta  Database  Research  Project:  GPMF  Fee.  I -‘Tine  Si  nu  later. 
Functional  Specification,  Design  Specification,  and  '.’sen's  . luide . NASA 
Contract  Documentation,  NAS9-13951,  Johnson  Space  Center,  Houston,  Texas, 
September  1975. 

Martin  Marietta  Database  Research  Project:  Data  Dictionary  Research, 

NASA  Contract  Documentation,  NAS9-13951,  Johnson  Space  Center,  Houston, 
Texas,  September  1975. 
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a network  to  access  data  from  other  nodes  without  knowledge  of  where  or  how 
the  data  are  implemented.  This  work  addressed  both  the  end-user  facility 
requirements  and  mapping  of  user’s  queries  to  the  distributed  systems. 

Early  in  the  work,  it  was  recognized  that  an  essential  element  in  this 
environment  is  a o'-nonic  data  model  and  a sufficiently  powerful  query  lan- 
guage in  terms  of  tne  data  model.  Much  significant  research  had  already 
produced  useful  results  in  this  area  using  the  Relational  Data  Model  10  as 
the  canonic  model.  To  take'  advantage  of  these  results,  the  project  chose  to 
replace  the  Entity  Set  Model  with  the  Relational  Data  Model,  first  demon- 
strating that  DIAM  concepts  yere  still  valid  in  terms  of  the  Relational 
Model. 11  Specifically,  this  decision  was  based  on  the  belief  that  Relation- 
al Query  Languages  such  as  SEQUEL,*2  QUEL,13  and  others14*15  would  suffice 
as  a representation-independent  query  language. 

However,  later  investigations  determined  that  existing  languages  did  not 
offer  clear  semantic  separation  of  application  functions,  and  despite  their 
promise,  the  practice  of  embedding  the  query  lanuage  in  a general-purpose 
programming  language  was  still  necessary.12*16  It  was  not  clear  that  the 
ANSI-X3-SPARC  architecture17  could  be  adequately  realized  because  external 
mappings  were  still  expressed  in  representation-dependent  terms,  and  access 
to  derived  data  was  limited  by  the  language.  To  fully  implement  ANSI-X3- 
SPARC  architecture,  it  is  necessary  that  user's  queries  be  independent  not 
only  of  internal  - representations , but  also  of  external  representations. 


10.  E.  F.  Codd:  "A  Relational  Model  of  Data  for  Large  Shared  Data  Banks," 
Communications  of  the  ACM,  Vol  13,  No.  6,  June  1970,  pp  377-387. 

11.  L.  S.  Schneider:  "A  Relational  View  of  the  Data  Independent  Accessing 
Model,"  ACM  SIGMOD  International  Conference  on  Management  of  Data, 
Washington,  D.C.,  June  1976,  pp  75-90  (ed.  James  B.  Rothnie). 

12.  Morton  M.  Astrahan  and  Donald  D.  Chamberlain:  Implementation  of  a 

Structured  English  Query  Language . RJ1464,  IBM  Research  Center,  San 

Jose,  California,  October  28,  1974. 

13.  M.  Stonebraker,  E.  Wong,  and  P.  Kreps : "The  Design  and  Implementation 
of  INGRES,"  ACM  Transactions  on  Database  Systems,  Vol  1,  No.  3,  Sep- 
tember 1976,  pp  189-222. 

14.  M.  M.  Zloof:  "Query  by  Example,"  Proc.  National  Computer  Conference, 
AFIPS  Press,  Vol  44,  1975,  pp  431-438. 

15.  E.  F.  Codd:  "A  Database  Sublanguage  Founded  on  the  Relational  Calculus," 
Proc.  1971  ACM  SIGFIDET  Workshop  on  Data  Description , Access,  and 
Control,  San  Diego,  California,  November  1971,  pp  35-68. 

16.  E.  Allman,  M.  Stonebraker,  and  G.  Held:  "Embedding  a Relational  Data 
Sublanguage  in  a General-Purpose  Programming  Language,"  ACM  SIGPLAN 
Notices,  Vol  II  Special  Issue,  Salt  Lake  City,  Utah,  March  1976, 

pp  25-35. 

17.  ANSI/X3 /SPARC  Study  Group:  Database  Management  Systems,  Interim  Report. 
FDT  7,  No.  2,  ACM,  New  York,  1975. 


""‘•us,  it  became  necessary  to  develop  a representation-independent  program- 
ming language,  and  this  was  begun  in  mid  19"'6.  By  early  1977  , the  language 
concepts  were  sufficiently  defined  to  complete  a conceptual  design  of  the 
end-user  facility18  and  to  begin  developing  prototype  software  of  a suffici- 
ent query  compiler  for  distributed  hetei ogeneous  database  systems.18 

Before  describing  RIPS  components,  we  present  conceptual  views  of  the 
ANSI-X3-SPARC  architecture  and  a distributed  information  system.  The  pur- 
pose of  this  discussion  is  to  introduce  the  major  components  of  RIPS  as  an 
implementation  of  ANSI-X3-SP4RC  architecture  in  a distributed  information 
system  environment. 

Conceptual  View  of  ANSI-X3-SPARC  Architecture  - The  ANSI-X3-SPARC  study 
group's  proposed  architecture!?  presents  a view  of  information  systems  that 
is  symmetric  with  respect  to  the  internal  and  external  mappings  that  occur. 

The  conceptual  schema  provides  a canonic  model  of  the  data  to  which  users 
address  their  queries  and  for  which  implementers  provide  efficient  access. 

In  terms  of  modules  in  the  architecture,  end  user  functions  or  external  map- 
pings are  performed  by  the  End-User  Facility  (EUF) , and  internal  mappings 
are  performed  by  the  Data  Management  System  (DMS) . To  perform  the  mappings, 
the  modules  require  access  to  the  external  and  internal  schemata,  which  are 
user-specified  to  the  extent  that  flexibility  is  accommodated  by  the  EUF  and 
DMS,  respectively.  Thus,  user's  queries,  submitted  via  the  interfacing 
techniques  provided,  must  be  mapped  to  representation-independent  queries  in 
terms  of  the  canonic  data  model,  and  subsequently  mapped  to  the  database  by 
the  DMS.  The  results  must  then  be  presented  by  the  DMS  in  terms  of  the  ca- 
nonic data  model,  then  mapped  to  the  user's  display  device  by  the  EUF. 

The  architecture  is  shown  in  Figure  1 in  terms  of  the  EUF,  the  canonic 
data  model,  and  the  DMS.  The  role  of  the  data  dictionary /directory  is  shown 
as  the  repository  of  the  metadata. 

Conceptual  View  of  RIPS  as  an  Example  of  ANSI-X3-SPARC  Architecture  - 
RIPS  conforms  to  ANSI-X3-SPARC  architecture,  providing  modules  to  perform 
the  mappings.  The  conceptual  view  (Fig.  2)  shows  the  corresponding  RIPS 
modules  for  ANSI-X3-SPARC  components.  The  figure  also  points  out  that  RIPS 
does  not  replace  an  existing  system  but  rather  provides  an  interface  to 
whatever  system  is  implemented.  Thus,  the  DMS  shown  in  Figure  2 is  respon- 
sible for  accessing  stored  and  derived  data  under  the  direction  of  RIPS. 

The  purpose  of  this  approach  is  of  course  that  RIPS  is  intended  to  provide 


18.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generalized  End-User  Facili^v  for 
Relational  Database  Systems,"  Proa.  Third  International  Conference  on 
Very  Large  Databases,  Tokyo,  Japan,  October  1977. 

19.  L.  S.  Schneider:  "A  Relational  Query  Compiler  for  Distributed  Hetero- 
geneous Databases,"  Submitted  for  publication  in  ACM  Transactions  on 
Database  Systems,  January  1977. 

17.  ANSI/X3/SPARC  Study  Group:  Database  Management  Systems,  Interim  Report. 
FDT  7,  No.  2,  ACM,  New  York,  1975. 
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Figure  1.  ANSI-X3-SPARC 
architecture. 


Data- 
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Figure  2.  RIPS  as  an  implementation  of 
ANSI-X3-SPARC  architecture. 


an  interface  to  a distributed  heterogeneous  database  without  requiring  that 
the  system  in  the  network  be  reprogrammed  to  accommodate  remote  accesses  to 
either  its  data  or  its  processes. 

Conceptual  View  of  a Distributed  Information  System  Environment  - In  the 
distributed  information  system  environment,  several  systems  are  linked  to- 
gether in  a network  so  that  a user  at  one  node  may  access  data,  or  use  pro- 
cesses, at  one  or  more  nodes  without  knowledge  of  where  or  how  in  the  net- 
work the  data  are  stored  or  derived,  as  shown  in  Figure  3. 

The  environment  we  envision  is  that  large  and  diverse  systems  exist  and 
are  justified,  but  additional  demands  arise  that  suddenly  require  access  to 
their  resources  by  other  systems.  Therefore,  centralization  is  not  an 


alternative,  and,  because  the  systems  are  large  and  well  established,  stand- 
ardization and  transformation  are  precluded  for  the  reason  stated  In  Refer- 
ence 2.* 


This  leads  to  the  view  shown  in  Figure  4 In  which  user's  queries  are  de- 
composed Into  subqueries  pertinent  to  each  node,  and  the  subqueries  are 
translated  Into  the  language  of  the  target  DMS.  Of  course,  a similar  pro- 
cess must  take  place  with  respect  to  returned  data,  either  for  display  or 
substitution  in  other  subqueries  as  qualifier  values. 

Incorporating  RIl’S  architecture  In  a distributed  information  system  re- 
sults In  a conceptual  view  shown  in  Figure  5.  We  will  return  to  this  dis- 
cussion aftei  describing  RIPS  components. 

Information  Structure 


The  Information  Structure  (IS)  provides  the  implementation-independent 
conceptual  model  in  RIPS.  The  IS  is  founded  on  the  Relational  Data  Model,10 
extended  to  also  allow  relational  descriptions  of  stored  algorithms.  At  the 
information  structure  level,  there  is  no  distinction  hetween  totally  stored 
data  anil  totally  derived  data  (algorithms),  nor  Is  there  anv  distinction 


2.  dames  F.  Berry  and  Craig  M.  Cook:  Viewing  Knowledge,  as  a Resource  in 
Federal  Departments  of  the  U.S.  Coverrment  . Economic  Research  Service, 
U.S.  Department  of  Agriculture,  September  1977. 

* Of  course,  there  may  be  other  environments  in  which  centralization  is 
called  for,  or  in  which  standardization  and/or  transformation  are  jus- 
tified (e.g.,  small  databases  and  stable  requirements). 

10.  F..  F.  Codd:  "A  Relational  Model  of  Data  for  barge  Shared  Data  Banks," 

Comuniaa tdons  of  the  ACM , Vol  13,  No.  6,  June  1970,  pp  377-387. 
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Figure  4.  Functional  view  of  a distributed  information  system. 

between  data  implemented  in  an  automated  database  and  manually  stored  data. 
Thus,  queries  expressed  in  terms  of  the  IS  require  no  knowledge  of  where  or 
how  the  data  are  implemented,  and  therefore  remain  stable  in  the  face  of 
changing  Implementations. 

In  a distributed  heterogeneous  database  system  environment,  a single 
information  structure  describes  what  data  are  available  in  the  network. 

This  single  model  is  sufficient  for  both  external  and  internal  mappings. 

Stored  Data  - Stored  data  are  described  as  third-normal-form  (TNF)  re- 
lations. The  primary  identifier — a single  attribute  or  multiple  attributes 
concatenated — are  recorded  as  such,  and  secondary  Identifiers  may  be  de- 
clared. The  distinction  between  domain  and  ittribute-role-name  is  maintain- 
ed, and  both  are  recorded.  A single  relati  n may  be  declared  even  though 
some  Instances  are  stored  at  one  node  and  some  at  another  (i.e.,  restric- 
tion distribution)  or  some  attributes  of  the  relation  are  stored  at  dif- 
ferent nodes  (i.e.,  projection  distribution). 


User 


DMS-B 

DMS-C 


Figure  5.  RIPS  applied  to  distributed  information  system. 

Derived  Data  - Computational  functions  are  described  as  TNF  relations, 
just  as  stored  data  are.  For  example,  the  SINE  function,  computed  algo- 
rithmically (e.g.,  Taylor  series  expansion)  can  be  described  as 

SINE(8,SIN-e) 

just  as  a table  of  stored  values  would  be.  There  is  no  distinction  at 
the  IS  level.  Data  derived  partially  from  stored  data  and  partially  from 
stored  algorithms  (e.g.,  interpolation  over  stored  values)  are  described 
via  derivations,  discussed  in  later  paragraphs. 

An  extension  of  the  Relational  Data  Model  allows  the  representation- 
independent  description  of  aggregation  algorithms  by  permitting  specifi- 
cation of  second-order  domains.  For  example,  the  algorithm 


can  be  described  as  the  relation 


SUM  (A'.SUM-A) 

where  A'  denotes  a second-order  attribute  (e.g.,  a bag),  and  SUM-A  repre- 
sents the  results  of  the  algorithm. 

Representation- Independent  Programming  Language  (RIPL) 

The  RIPS  programming  language  is  RIPL.  It  is  a representation-indepen- 
dent  nonprocedural  programming  language,  and,  at  the  user's  level,  eliminates 
current  requirements  for  embedding  the  query  language  in  a general-purpose 
programming  language  to  direct  procedures  for  database  accesses  and  compu- 
tations. RIPL  queries  (including  retrievals,  adds,  deletes,  and  changes) 
specify  what  information  is  required,  not  how  processing  is  to  be  accom- 
plished. 

RIPL  is  separated  into  two  conceptual  levels:  RIPLo  and  RIPL^.  The 

major  conceptual  difference  is  that  statements  in  RIPL^  reference  only  the 

IS  and  are  processable  by  the  RIPS  query  compiler /translator . RIPL  state- 

n 

ments  reference  user  views  and  derivations  and  must  be  reduced  to  RIPL 

o 

through  decomposition  by  the  GEUF  RIPL  preprocessor  before  execution. 

A statement  in  RIPL  references  a single  relation  as  its  range  and  re- 
sults, conceptually,  in  a single  relation  that  is  a projection  and/or  re- 
striction of  the  range.  Joins  are  implicit  in  RIPL,  specified  by  linking 
predicates  in  the  qualifier. 

A RIPL  query  is  one  or  more  RIPL  statements,  whose  results  form  a sub- 
set of  the  total  information  structure  and  are  thus  independent  of  the  ex- 
ternal representations  required  (i.e.,  display  formats).  RIPL  is  founded 
on  the  first-order  predicate  calculus  extended  to  second-order  prediction, 
made  possible  by  the  RIPS  extended  IS.  RIPL  contains  no  built-in  computa- 
tional operators,  allowing  whatever  computations  are  desired  to  be  expressed 
in  a consistent  manner.  This  feature  permits  RIPL  queries  to  be  translated 
into  other  languages  whose  set  of  built-in  operators  may  differ.  For  ex- 
ample, if  the  SUM  relation  described  earlier  is  referenced  in  a RIPL  query, 
and  the  target  query  language  does  not  contain  a built-in  SUM  operator,  the 
translation  must  be  to  a language  that  does  provide  the  algorithm.  Thus, 
the  result  of  the  translation  might  be  a general-purpose  programming  lan- 
guage that  embeds  the  target  query  language  so  that  applicable  retrievals 
are  in  the  DBMS  language  and  the  summing  algorithm  is  a subroutine  call  or 
a procedure  in  the  general-purpose  language.  Thus,  computational  operations 
available  for  RIPL  queries  are  extensible  to  include  whatever  computational 
algorithms  are  implemented  for  a given  installation. 

/ 

RIPL  contains  only  two  relational  operators  for  prediction:  e (element 
o. 

of)  and  i (not  an  element  of) . Any  additional  relational  operators  desired 
(e.g.,  >,  »,  <,  SELLS,  EXPORTS,  HIGHER-THAN,  etc)  are  declared  as  derivations 
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(page  18),  and  thereafter  may  be  used  In  RIPL  queries.  Thus,  relational 

n 

operators  in  RIPL  are  extensible  to  include  whatever  form  of  prediction  is 
user-defined  for  an  installation. 

Examples  of  RIPL  queries  and  concepts  are  in  Reference  18  and  through- 
out this  document. 

Data  Dictionary/Directory  (DD/D) 

In  RIPS,  the  DD/D  is  considered  part  of  the  database.  Its  major  purpose 
is  to  record  metadata.  However,  metadata  are  as  important  to  an  organization 
as  any  other  data,  and  are  therefore  made  accessible  with  all  the  same  user 
techniques  and  facilities.  This  is  accomplished  in  RIPS  by  first  viewing  all 
metadata  as  stored  relations,  and  second,  including  descriptions  of  these  re- 
lations in  the  IS.  Thus,  queries  for  metadata  are  simply  representation-in- 
dependent queries  to  (a  segment  of)  the  database  that  can  be  implemented  in 
whatever  manner  is  appropriate  for  its  traffic.  The  query  compiler /trans- 
lator directs  the  queries  accordingly. 

This  concept  also  allows  all  the  same  techniques  of  integrity  and  author- 
ization management  provided  for  any  data  to  be  applied  to  metadata  because 
these  controls  are  implemented  at  the  information  structure  level  in  RIPS, 
as  described  later. 

Metadata  Management  - The  data  directory  contains  implementation  descrip- 
tions of  both  internal  storage  and  external  representations  of  user  inter- 
faces and  display  formats  in  DIAM  descriptions  viewed  as  relations.  The  di- 
rectory is  available  to  the  query  compiler  to  compile  RIPL  queries  in  repre- 
sentation-dependent queries  and  to  the  DBA  to  record  and  interrogate  the  de- 
tails in  the  management  process. 

The  data  dictionary  contains  metadata  that  are  primarily  specifications 
of  user-oriented  functions,  including  user  views,  predefined  queries,  integ- 
rity assertions,  etc.  The  dictionary  is  available  to  the  GEUF  to  provide 
user  interfaces;  to  the  RIPL  preprocessor  to  decompose  RIPL^  queries  into 

RIPL^  queries;  and  to  the  enterprise  administrator  in  the  management  of  data 

resources,  through  the  declarations  of  access  controls  and  the  interrogation 
of  current  metadata. 

Data  Description  Language  (DDL)  - The  RIPS  concept  eliminates  the  dis- 
tinction between  a separate  DDL  and  DML.  Because  the  contents  of  the  DD/D 
are  viewed  as  relations  and  are  recorded  in  the  IS,  queries  to  add,  change, 
or  delete  the  corresponding  metadata  instances  are  simple  RIPL  queries.  Any 
desired  simplified,  interfacing  user  language  can  be  defined  using  the  GEUF 
capabilities . 


18.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generalized  End-User  Facility  for 
Relational  Database  Systems,"  Proa.  Third  International  Conference  on 
Very  Large  Databases,  Tokyo,  Japan,  October  1977. 
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Quantitative  Data  Descriptions  (QDD)  - The  QDD1*  extends  the  view  of  the 
real  world,  represented  by  the  database,  to  Include  representations  of  the 
organization  Itself,  and  to  describe  the  organization's  use  of  the  database. 
Thus,  users  (e.g.,  data  sources,  product  users,  etc)  are  represented  in  the 
QDD,  along  with  descriptions  of  what  data  they  use,  how,  and  at  what  rate. 

The  composite  use — across  all  users — describes  the  dynamics  of  the  real 
world  including  the  organizational  use.  These  constitute  the  requirements 
of  the  system;  as  the  user's  profiles  change,  the  system  is  required  to 
continue  to  meet  the  changing  demands.  User's  profiles  will  change  as 
either  the  external  real  world  changes  or  as  the  organization  itself  changes. 
Thus,  the  database  system's  implementation  is  constantly  subject  to  change 
to  accommodate  changing  requirements.  The  QDD  can  be  maintained  dynamically 
or  recomputed  periodically.  Either  choice  is  available  through  the  GEUF. 

The  QDD  is  stored  in  the  data  dictionary  and  is  available  to  the  enterprise 
administrator  to  provide  visibility  of  organizational  data  flows  or  to  the 
DBA  to  access  current  requirements.  In  addition,  the  QDD  is  used  by  the 
query  compiler's  optimizer  to  assess  the  effects  of  data  populations  and 
population  distributions  on  candidate  search  paths  in  the  search-path  selec- 
tion process. 

Application  Function  Descriptions  - 

Partially  Predefined  Queries  - Typical  queries  to  an  information  system 
consitute  a contim  uf  ranging  from  ad  hoc  queries  to  real-time  displays  as 
described  in  Reference  18.  Queries  are  stated  in  RIPL  in  terms  of  user 
views,  derivations,  ana/or  the  basic  information  structure.  Queries  that  are 
executed  repeatedly  are  predefined  and  stored  as  relations  in  the  data  dic- 
tionary, and  the  correlation  to  the  stimulus  that  will  initiate  their  exe- 
cution is  also  declared  and  stored  in  the  dictionary.  Queries  that  are 
fully  determined  as  to  context  and  whose  stimulus  in  internally  generated 
(e.g.,  clock  time,  other  queries,  etc)  are  fully  predefined  and  stored. 
Queries  that  are  partially  predefined,  requiring  either  an  externally  gen- 
erated stimulus  (e.g.,  function  key,  command,  light  pen,  etc)  or  particu- 
larized values  of  qualification  predicates,  selected  attributes  for  display, 
current  method  of  display,  etc,  are  also  stored  as  relations  for  which  the 
missing  values  are  to  be  supplied  by  the  user  at  execution  time. 

Because  partially  predefined  queries  are  viewed  as  relations,  the  user- 
supplied  values  are,  in  essence,  change  queries  to  update  the  relations. 

The  external  represenation  of  the  user-supplied  data  (i.e.,  form)  is  defined 
by  DIAM  descriptions  stored  in  the  data  directory.  When  the  form  is  trans- 
mitted from  an  external  device,  The  GEUF  directs  the  mapping  to  the  rela- 
tions to  complete  the  partially  predefined  query  as  described  under  General- 
ized End-User  Facility  (GEUF) . 


( 4.  L.  S.  Schneider  and  C.  R.  Spath:  "Quantitative  Data  Description," 

Proa . ACM  SIGMOD  International  Conference  on  Management  of  Data, 
San  Jose,  California,  May  1975,  pp  167-195  (ed.  W.  F.  King). 


18.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generalized  End-User  Facility  for 
Relational  Database  Systems,"  Proa.  Third  International  Conference  on 
Very  Large  Databases , Tokyo,  Japan,  October  1977. 
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Thus,  the  essential  elements  for  partially  predefined  queries  are  the 
relations  containing  the  predefined  part,  correlation  between  the  query  and 
the  applicable  form  (i.e.,  stimulus),  correlation  between  the  query  a.id  the 
form's  description,  and  the  description  of  the  external  representation 
(l.e.,  geometry  of  the  form),  'Each  of  these  is  specified  in  relations  of 
the  DD/D. 

Ad  hoc  queries  are  simply  stated  in  RIPl.^,  and  the  GEUF  decomposes  them 

Into  RIl’L  as  described  under  Generallred  End-llser  Eacilitv  (GEUF). 
o 

Any  popular  method  of  man-machine  interaction  is  accommodated  by  the 
above  descriptions  and  processable  by  the  GEUF.  For  example,  a function 
key  may  be  defined  as  the  stimulus  to  execute  a predefined  query  that  dis- 
plays a form;  the  form  may  be  declared  as  the  stimulus  to  execute  one  or 
more  other  queries — returning  either  data,  another  form,  or  both,  creating 
a dialogue  or  computer-aided  instruction  (GAI).  Because  the  query  is  in- 
dependent of  both  the  form's  geometry  and  the  stimulus,  forms  may  be  changed 
with  respect  to  the  external  representation;  and  other  stimuli  may  be  chang- 
ed without  affecting  the  query  or  the  display  of  its  results. 

Stimulus  Specifications  - Stimuli  that  initiate  predefined  queries  are 
specified  in  relations  stored  in  the  DD/D.  Stimuli  may  be  either  externally 
supplied  (e.g.,  forms,  function  keys,  light  pen,  external  event  monitors, 
etc)  or  internally  generated  (e.g.,  clock  time,  successful  or  unsuccessful 
execution  of  another  query,  results  of  another  query,  etc).  Trigger  queries 
are  provided  by  declaring  the  stimulus  for  a predefined  query  to  be  the  exe- 
cution of  another  predefined  query.  Real-time  queries  are  established  bv 
declaring  the  stimulus  for  a predefined  query  to  be  internal  clock  intervals 
(e.g.,  once  every  second).  Alerting  is  accommodated  by  declaring  the  stimu- 
lus for  the  predefined  query  that  determines  if  the  alert  condition  exists 
to  be  either  clock-time  intervals  or  execution  of  any  queries  that  can  af- 
fect alert  conditions,  whichever  is  applicable. 

.'.'a:  lay  Foists  - The  description  of  display  formats  is  specified  by 
DIAM  descriptions,  extended  to  allow  descriptions  of  two-dimensional  dis- 
placements, stored  as  relations  in  the  data  directory.  The  information 
structure  over  which  descriptions  are  made  is  the  relations  resulting  from 
the  corresponding  RIPL  query.  Thus,  the  display  of  a query  mav  be  changed 
without  affecting  the  query  itself. 

The  description  of  user-supplied  data  is  also  specified  bv  DIAM  descrip- 
tions. The  information  structure  over  which  they  are  declared  is  the  rela- 
tions that  describe  the  partially  predefined  query,  and  the  PIAM  descrip- 
tions need  only  specify  the  external  representation  of  attributes  that  are 
subject  to  user  specifications. 

Integrity  Assertions  - Integrity  assertions  are  defined  in  RIPL  predi- 
cates in  terms  of  the  applicable  relations  and  attributes,  and  are  stored 
in  the  data  dictionary.  Assertions  range  from  simple  attribute  value  range 

IS.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generallred  End-User  Facility  for 

Relational  Database  Systems,"  Free.  Fhit\:  rnter*utional  Conference  on 
Very  Uxrge  l\3tahxses , Tokyo,  Japan,  October  1977. 
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integrity  (e.g. , value  of  employee  salary  must  be  between  $10K  and  $30K)  to 
complex  aggregate  value  specification  (e.g.,  the  sum  of  salaries  of  any  de- 
partment must  not  exceed  20Z  of  the  sum  of  all  departments).  Set  operations 
are  declared  in  terms  of  linking  predicates.  Thus,  the  value  of  DEPT-NO  of 
any  employee  in  EMP  must  be  in  the  set  of  DEPT-NOs  in  the  DEPT  relation. 
Literal  sets  may  be  declared — EMP/SEX  must  be  in  the  set  MALE,  FEMALE. 

In  general,  integrity  assertions  may  include  the  entire  range  of  RIPL 
queries.  The  assertions  are  processed  by  the  GEUF  as  described  under  Gener- 
alized End-User  Facility  (GEUF) . 

Authorization  Constraints  - Authorization  constraints  are  defined  as 
RIPL  predicates  in  terms  of  applicable  relations  and/or  attributes  in  Bool- 
ean combinations , and  are  stored  in  the  data  dictionary.  Authorizations 
may  be  declared  for  particular  users  or  sources  (i.e.,  terminals,  etc)  or 
both.  In  general,  authorization  constraints  may  include  restrictions  on 
any  information  that  can  be  declared  by  RIPL  statements.  Authorization 
constraints  are  processed  by  the  GEUF  as  described  under  Generalized  End- 
User  Facility  (GEUF). 

Derivations  - Derivations  are  the  declaration  of  named  concepts  deriv- 
able from  stored  data,  stored  algorithms,  or  both.  The  purpose  of  deriva- 
tions is  to  provide  the  user  with  defined  concepts  that  he  can  reference 
in  RIPL  queries  without  having  to  derive  them  independently.  Derivations 
are  expressed  as  RIPL  statements  over  the  information  structure  or  other 
derivations  only,  and  are  stored  in  the  data  dictionary.  The  use  of  deri- 
vations extends  the  user's  view  of  the  information  structure,  allowing  him 
to  write  simpler  queries  ^RIPL^)  without  regard  to  how  the  data  are  actually 

derived.  The  GEUF  reduces  the  RIPL  queries  to  RIPL  queries  as  described 

n o 

under  Generalized  End-User  Facility  (GEUF).  Any  RIPL  query  that  can  be  ex- 
pressed in  terms  of  the  IS  and  other  derivations  may  be  declared  as  a deri- 
vation and  thereafter  be  viewed  as  a relation  or  attribute  of  a relation. 

For  example,  given  the  relations 

EMP ( E# , SAL , DEPT# ) 

DEPT (DEPT# , NAME , LOG ) 
representing  stored  data,  and 

SUM(A' , SUM- A) 

representing  the  algorithm  described  earlier,  users  can  derive  the  concept 
"salary  of  departments,"  meaning  the  sum  of  salaries  of  all  employees  in 
each  department  as 

(1)  GET  S(DEPT)  .OF.  EMP/SAL  .WHERE.  DEP T #■ DEPT /DEPT# 

(2)  GET  DEPT-SAL ( DEPT ) .OF.  SUM/SUM-A  .WHERE.  A' -S (DEPT) /SAL 

The  derivation  in  (1)  specifies  the  set  (actually,  bag)  of  salaries  for  the 
EMP  relation  for  each  corresponding  tuple  in  the  DEPT  relation,  and  in  (2) 
specifies  Che  sum  of  each  set  of  salaries  in  the  same  context. 
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By  declaring  the  above  as  a derivation  In  the  DD/D,  the  DEPT  relation 
la  extended  to  include  the  derived  department-salary  as 

DEPT (DEPT# , NAME , LOC , DEPT-SAL) 

» 

Now,  users  may  express  queries  In  RIPL  to  reference  the  derived  attribute 
as  simply 

GET  S .OF.  DEPT / DEPT-SAL  .WHERE.  ... 

The  RIPL  preprocessor  (see  page  22)  reduces  such  queries  to  RIPL^  by  sub- 
stituting the  derivation  statements  | because  they  are  already  in  RIPLo)  for 

references  to  derived  concepts,  appending  any  user-supplied  qualifications 
as  applicable. 

Derivations  may  also  be  used  in  qualifications.  Thus,  to  find  all  de- 
partments whose  salary  is  greater  than  X 

GET  S .OF.  DEPT/DEPT#  .WHERE.  DEPT-SAL  > 'X' 

which  is  similarly  reduced  to  RIPLq. 

The  example  illustrates  the  power  of  derivations  in  simplifying  know- 
ledge concepts  for  users  while  at  the  same  time  making  the  derivation  of 
the  concept  visible  to  management  because  it  is  stored  in  the  DD/D. 

This  concept  is  extended  to  provide  a subset  of  natural  language  by 
allowing  the  derivation  of  relational  operators.  For  example,  given  the 
relations 

EMP (E# , D# , SAL , ...) 

DEPT (D#, NAME, LOC,  ...) 

SALES (D*. PART, QTY,  ...) 

find  the  location  of  all  departments  that  sell  bolts. 

The  query  can  be  stated  in  RIPLq  as 
GET  S .OF.  SALES /D//  .WHERE.  PART-' BOLT' 

PRINT  T .OF.  DEPT/LOC  .WHERE.  DO-S/DO 
However,  we  can  derive  the  concept  'sells'  as  the  second-order  relation 
SELLS (D#, PART') 
by  the  statement 

SELLS (DEPT)-SALES/PART  .WHERE.  D//-DEPT/D// 
declared  as  a derivation  in  the  DD/D.  RIPL^  now  allows  the  same  query  to 
be  stated  more  naturally  as 

PRINT  S .OF.  DEPT/LOC  .WHERE.  DO  SELLS  'BOLT' 

Again,  the  RIPL  preprocessor  reduces  the  query  to  RTPL^  statements,  using 
the  derivation  in  the  reduction. 
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User  Views  - User  views  are  the  declaration  of  alternative  views  of  the 
extended  information  structure.  User  views  may  declare  synonyms  for  con- 
cepts (relations,  attributes),  restrict  relations,  limit  projections  of  re- 
lations, and  may  declare  unnormalized  relations  for  '■etrievals.  The  purpose 
of  user  views  is  to  allow  users  to  state  simpler  queries  by  presenting  them 
with  a view  of  the  information  structure  tailored  to  their  special  interests 
in  their  vernacular. 

User  views  are  declared  by  RIPL  predicates  over  the  information  struc- 
ture, derivations,  or  other  user  views  and  may  be  declared  for  a particular 
set  of  users  or  sources.  Queries  stated  over  user  views  jRIPL^  ) are  re- 
duced to  queries  over  the  basic  information  structure  only  by  the  GEUF  as 
described  under  Generalized  End-User  Facility  (GEUF) . 

Query  Compiler/Translator  (QC/T) 

The  QC/T19  allows  users  to  interact  with  the  entire  data  and  algorithm 
resources  of  a computer  network  as  though  it  were  a single  integrated  sys- 
tem when,  in  fact,  it  is  distributed  as  a number  of  independent  and  dissimi- 
larly implemented  systems.  The  QC/T  accepts  RIPL  queries  and  generates  the 

o 

required  access  programs  automatically,  as  well  as  the  logic  to  synthesize 
the  data  returned  by  each  pertinent  database  into  the  response  sought.  The 
process  is  shown  in  Figure  6 and  described  below. 

1)  Decompose  the  query  according  to  the  properties  of  the  relational 
model  into  subqueries  so  narrow  in  scope  that  no  more  than  one  col- 
located homogeneous*  system  is  necessary  to  resolve  each  concurrent- 
ly, determining  the  logic  to  recompose  the  result  into  the  third- 
normal-form  (TNF)  relations  defined  by  the  query; 

2)  Compile  each  subquery  according  to  the  properties  of  the  correspond- 
ing DIAM  into  an  access  subprogram  that  is  semantically  compatible 
with  the  pertinent  database  and  algorithms; 


19.  L.  S.  Schneider:  "A  Relational  Query  Compiler  for  Distributed  Hetero- 
geneous Databases,"  Submitted  for  publication  in  ACM  Transactions  on 
Database  Systems , January  1977. 

* Because  the  essence  of  this  process  xS  to  transform  references  to  dis- 
tributed heterogeneous  information  systems  (which  we  can't  process) 
into  references  that  don't  involve  distributed  heterogeneous  informa- 
tion systems  (which  we  can  process) , we  need  a term  to  describe  an  in- 
formation system  that  isn't  distributed  heterogeneous.  The  opposite 
of  heterogeneous  is  obviously  homogeneous  (similarly  implemented) . 

The  traditional  antonym  for  distributed  is  centralized,  but  this  car- 
ries the  wrong  connotation  for  our  use  in  that  there  is  no  central 
node  in  a distributed  system.  The  real  meaning  we  want  to  convey  is 
"stored  together"  or  "co-located,"  for  which  there  is  already  an 
acceptable  English  word — collocated.  Hence,  the  opposite  of  distrib- 
uted heterogeneous  for  our  purpose  is  collocated  homogeneous. 
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3)  Recompose  the  access  subprograms  according  to  the  DIAMs  into  the 
most  comprehensive  program  possible  for  each  pertinent  system 
(concurrently  determining  the  logic  to  decompose  the  resulting 
data  into  the  INF  relations) ; 

A)  Perform  syntactic  translation  of  each  subquery  according  to  the 
target  DMS  and  transmit  according  to  the  protocol  of  the  com- 
munications system. 

The  QC/T  contains  logic  for  selecting  the  most  efficient  access  path 
(with  respect  to  total  network  resource  use)  as  described  in  Reference  19. 

The  GEUF  provides  the  flexibility  in  how  queries  are  stated  by  users, 
reducing  each  to  RIPL^  before  execution  by  the  QC/T.  The  resulting  received 

data  are  compiled  by  the  QC/T  in  a predetermined  temporary  format,  and  the 
details  of  how  it  is  to  be  displayed  to  Che  user  are  provided  by  the  GEUF. 


19.  L.  S.  Schneider:  "A  Relational  Query  Compiler  for  Distributed  Hetero- 
geneous Databases,"  Submitted  for  publication  in  ACM  Transactions  on 
Database  Systems,  January  1977. 


21 


In  both  cases — mapping  user's  Interfaces  to  RIPL  queries  and  mapping  the  re- 

o 

suits  to  displays — the  QC/T  is  again  used  under  the  direction  of  the  GEUF. 

The  recursive  use  of  the  QC/T  ia.  discussed  in  the  next  paragraph. 

Generalized  End-User  Facility  (GEUF) 

The  GEUF18  performs  an  invariant  sequence  of  operations  each  time  a 
stimulus  is  received.  The  effect  of  the  sequence  is  to  direct  the  applicable 
processing  according  to  application  functions  defined  for  that  stimulus.  The 
corresponding  functions  are  defined  and  stored  in  the  data  dictionary,  and 
thus,  the  operations  center  around  retrievals  of  the  specifications  and, 
using  the  specifications,  particularizing  other  queries  for  the  next 
sequential  operation. 

In  general,  if  the  stimulus  is  externally  supplied,  operation  of  the 
GEUF  results  in  a mapping  of  user-supplied  data  to  a partially  predefined 
RIPL^  query  or  queries  (which  are  subsequently  reduced  to  RIPL^  queries  by 

the  GEUF  query  preprocessor)  and  the  results,  if  any,  are  mapped  to  the 
proper  external  device(s)  according  to  formatting  specifications.  If  the 
stimulus  is  internally  generated  (e.g.,  clock  time,  results  of  another  query, 
etc),  operation  of  the  GEUF  results  in  execution  of  other  predefined  queries 
and,  if  required,  the  results  of  these  queries  are  mapped  to  external 
devices.  Thus,  the  GEUF  consists  primarily  of  a stimulus  monitor  and  table- 
driven  query  generator. 


In  RIPS,  the  query  compiler/translator  performs  retrievals,  adds,  etc 
to  a database  by  formulating  a program  in  the  language  of  the  DBMS,  it  per- 
forms the  same  functions  when  the  database  is  the  user's  terminal  in  much 
the  same  way — by  formulating  a program  in  the  language  of  the  device  driver. 
Thus,  retrievals  (of  user-supplied  data),  changes  (to  particularize  a par- 
tially predefined  query) , and  adds  (to  display  the  results)  are  directed  by 
the  GEUF,  compiled  into  executable  programs  by  the  query  compiler/translator , 
but  performed  by  the  DMS's  and  device  drivers.  The  recursive  use  of  the 
query  compiler/translator  requires  only  that  the  GEUF  formulate  proper  RIPL^ 

retrievals,  changes,  adds,  and  deletes,  submitting  them  to  the  query  compiler/ 
translator  for  execution.  This  process  is  shown  in  Figure  7. 

The  GEUF  RIPL  preprocessor  modifies  user's  queries  by  appending  the  in- 
tegrity and  authorization  assertions  in  a similar  manner.  The  GEUF  submits 
retrieval  queries  to  obtain  the  applicable  predicates  from  the  DD/D  and 
change  queries  to  modify  the  query.  The  concepts  of  query  modification  are 
taken  from  those  proposed  in  Reference  20.  Similarly,  RIPL  queries  are 


18.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generalized  End-User  Facility  for 
Relational  Database  Systems,"  Proa.  Third  International  Conference  on 
Very  Large  Databases , Tokyo,  Japan,  October  1977. 

20.  M.  Stonebraker:  "Implementation  of  Integrity  Constraints  and  Views  by 
Query  Modification,"  Proa.  ACM  SIGMOD  International  Conference  on 
Management  of  Data,  San  Jose,  California,  May  1976,  pp  65-78 
(ed.  W.  F.  King). 
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Stimulus 


Figure  7.  GEUF  process. 

reduced  to  RIPLq  by  retrieving  the  applicable  declarations  and  substitut- 
ing them  in  place  of  RIPL^  references,  maintaining  all  original  qualifiers. 

Functional  View  of  RIPS 

In  Figure  8,  we  present  a more  complete  view  of  RIPS  in  a distributed 
information  system  environment  than  that  shown  in  Figure  5.  Figure  8 
shows  the  concepts  of  viewing  user's  devices  and  the  DD/D  as  part  of  the 
database.  Just  as  distributed  systems  or  nodes  of  the  network  may  be  under 
the  control  of  dissimilar  DMSs,  so  may  users'  devices  be  under  the  control 
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of  different  interfacing  software  or  device  drivers.  Note  that  the  DD/D 
is  shown  to  be  under  the  control  of  a single  local  DMS,  but  it  too  may  be 
distributed. 


The  sequence  of  events  peformed  by  the  GEUF  for  each  stimulus  is  shown 
grouped  into  three  functional  blocks.  The  first  block  accepts  RIPI. 

n 

queries  or  completes  partially  predefined  queries  by  appending  user- 
supplied  data,  and  reduces  them  to  RIPL^.  The  second  block  directs  the 

execution  of  RIPLq  queries,  £nd  the  third  performs  display  formatting 

functions.  Reference  18  has  a more  detailed  discussion  of  the  GEUF  se- 
quence of  operations. 

Database  Management  System  Simulator 

The  DBMS  simulator  comprises  two  parts:  a math  model  simulator6’7  and 
a real-time  simulator6  as  described  below. 

Math  Model  Simulator  - The  DBMS  simulator  performs  a discrete-event 
simulation  of  the  functions  of  the  general  class  of  database  management 
systems.  To  represent  a broad  class  of  such  DMSs , the  simulator  is  based 
on  an  underlying  canonic  model — the  Data-Independent  Accessing  Model  I 
(DIAM  I).  It  was  conceived  as  a tool  to  aid  the  study  and  application  of 
DMSs  and  allows  the  simulation  of : 

1)  A user's  application  in  terms  of  its  information  structure  and 
traffic  rates; 

2)  A candidate  implementation  of  the  application  in  a DMS,  reflecting 
the  proposed  implementation  of  data  relationships  and  recognizing 
any  restrictions  Imposed  by  a specific  DMS; 

3)  A candidate  host  system  representing  pertinent  aspects  of  the 
planned  host  computer  and  its  operating  system. 


18.  C.  R.  Spath  and  L.  S.  Schneider:  "A  Generalized  End-User  Facility  for 
Relational  Database  Systems,"  Proa.  Third  International  Conference  on 
Very  Large  Databases , Tokyo,  Japan,  October  1977. 

6.  L.  S.  Schneider  and  T.  W.  Connolly:  "Generalized  Data  Base  Management 
System  Simulator,"  Proa  1976  Winter  Simulation  Conference , Vol  2, 
December  1976  (ed.  H.  J.  Highland,  et  al.). 

7.  Martin  Marietta  Database  Research  Project:  GDMS  Math  Model  Simulator, 
Functional  Specification,  Design  Specification  and  User's  Guide.  NASA 
Contract  Documentation,  NAS9-13951,  Johnson  Space  Center,  Houston, 
Texas,  September  1975. 

8.  Martin  Marietta  Database  Research  Project:  GDMS  Real-Time  Simulator, 
Functional  Specification,  Design  Specification,  an.1  User's  Guide. 

NASA  Contract  Documentation,  NAS9-13951,  Johnson  Space  Center,  Houston, 
Texas,  September  1975. 
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Use  of  the  simulator  permits  studies  relevant  to  the  device  and  use 
of  DMSs.  Typical  studies  of  this  nature  might  Include: 

1)  Comparison  of  the  operating  performance  of  two  competing  DMSs 
for  the  same  application; 

2)  Comparison  of  the  operating  performance  of  various  implementa- 
tions of  an  application  using  options  available  In  a single  DMS; 

3)  Comparison  of  operating  results  based  on  differing  host-svstem 
configurations  being  considered; 

4)  Studies  to  enhance  the  user's  knowledge  and  familiarity  with  DMS 
techniques. 

The  simulator  consists  of  subsystems,  each  hierarchically  subdivided 
into  a number  of  modules.  Four  of  the  principal  correspond  generally  to 
the  four  subsystem  levels  of  the  DIAM: 

1)  Information-based  model  - Generates  queries  representing  the  ap- 
plication under  study  and  maintains  data  population  statistics; 

2)  Structure-based  model  - Accepts  the  queries  as  Representation- 
Independent  Accessing  Language  (RIAL)  statements  and  uses  defini- 
tions of  implemented  access  paths  to  produce  Representation- 
Dependent  Accessing  Language  (RDAL)  statements; 

3)  Procedure-based  model  - Accepts  the  RDAL  and  produces  a sequence 
of  input/output  accesses  based  on  the  definition  of  how  and 
where  access  paths  and  data  are  stored; 

4)  Host  model  - Represents  host  computer-system  logic,  including  its 
peripherals  and  operating  system,  as  it  pertains  to  calculation 
of  response  time  and  resource  use  in  processing  I/O  access 
resquests . 

An  executive  subsystem  accepts  control  from  the  operating  system  at  execu- 
tion time,  and  contains  modules  to  read  and  store  simulation  data,  con- 
figure the  simulation,  control  experimental  runs,  and  produce  the  required 
output . 

The  entire  simulator  has  been  programmed  in  the  FORTRAN  language  with 
very  few  deviations  from  ANSI  FORTRAN  standards.  The  source  code  has  been 
implemented  on  CDC  6000,  Univac  1100,  and  IBM  370  systems  operating  in  a 
batch  environment. 

Real-Time  Simulator  (RTS)  - The  real-time  simulator  is  designed  to  pro- 
vide empirical  baseline  data  to  support  simulation  experiments  conducted 
with  the  MMS.  The  primary  need  for  these  data  is  to  support  calculation 
of  the  MMS  in  new  experimental  situations,  particularly  where  empirically 
derived  analytic  functions  are  being  employed  in  the  predictive  process. 
This  can  be  effectively  satisfied  by  a GDMS  "test  bed"  in  which  stimulus 
can  be  controlled  and  resulting  performance  measured. 
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Principal  software  components  of  the  real-time  simulator  Include: 

1)  The  Database  Manager  - A generalized  data  management  system 
that  is  a candidate  being  evaluated  for  a given  application; 

2)  An  Input  Load  Unit  - A program  component  that  controls  (or 
generates)  a traffic  load  for  processing  by  the  DBMS.  When  the 
actual  database  for  the  application  under  study  does  not  exist, 
significantly  distributed  symbolic  instances  are  automatically 
generated  in  accordance  with  the  QDD  static  descriptions. 

Similarly,  transactions  described  by  the  QDD  dynamics  can  be 
generated,  ensuring  that  the  application  descriptions  for  both 
the  math-model  and  real-time  simulations  correspond. 

3)  An  Instrumentation  Unit  - A program  component  that  measures 
and  reports  the  time  and  resources  used  by  the  host  system 
for  processing  each  item  of  traffic  (assumed  to  be  vendor 
supplied)  ; 

4)  An  Anlysis  Unit  - A program  component  that  controls  the  ex- 
perimental process  in  the  RTS  and  produces  the  required  out- 
puts for  external  use. 

The  real-time  simulator  resides  in  the  actual  host  computer  and  op- 
erating system,  and  uses  an  actual  GDMS  together  with  input,  instrumenta- 
tion, and  analysis  units.  The  resulting  implemented  test  bed  permits 
measurement  and  evaluation  of  the  GDMS  under  actual  working  conditions. 
Instrumentation  results  provide  values  of  environmental  and  functional 
parameters  that  correspond  to  those  used  in  the  math-model  simulator. 

DMS  Software  Evaluation  Methodology 

The  purpose  of  the  DBMS  math-model  and  re3l-time  simulators  described 
in  the  previous  section  is  evaluation  of  software  for  quantifiable  per- 
foi  lance  characteristics.  However,  evaluation  of  software  extends  to 
characteristics  that  are  inherently  unquantif iable . These  include  ease 
of  use,  conformance  to  standards,  vendor  support,  documentation,  data 
independence,  and  others. 

The  use  of  simulation  imposes  a formal  approach  to  requirements  defi- 
nition and  performance  analysis  for  quantifiable  characteristics,  and  the 
DMS  software  evaluation  methodology  extends  this  approach  to  include  the 
equally  important  unquantifiable  Issues.  The  essence  of  this  methodology 
is  to  establish  such  Issues  as  constraints  in  the  selection  process  and 
to  either  eliminate  candidates  that  are  unable  to  satisfy  these  constraints 
or  to  derive  the  cost  of  satisfying  the  constraints  and  adding  these  to 
the  life-cycle  cost  profile. 

Thus,  for  example,  if  vendor  support  is  required,  we  must  establish 
a level  of  such  support  and  obtain  a commitment  from  candidate  vendors. 

If  a required  standard  is  not  met  by  a candidate  package  or  design,  we 
must  obtain  a cost  for  bringing  it  into  conformance  or  eliminate  it  from 
further  consideration. 


The  objective  of  this  approach  is  to  compare  candidates  objectively 
on  an  equal  basis  for  their  ability  to  satisfy  the  requirements  and  to 
eliminate,  as  much  as  possibly,  the  subjectivity  involved  in  using  such 
techniques  as  rankings  by  weighted  scores.  This  provides  candidate  ven- 
dors or  designers  the  opportunity  to  bring  their  products  into  compliance, 
where  practical,  and  reduces  the  selection  criterion  to  one  of  life-cycle 
cost. 


SYNOPSIS  OF  KM  REQUIREMENTS 


This  section  lists  KM  requirements  in  Section  2 of  Refernce  1 and  in 
the  appendix  of  Reference  2,  taken  in  their  order  of  appearance.  Refer- 
ences are  made  to  their  appearance  in  the  source  documents,  and  cross- 
referenced  to  the  discussion  of  their  allocation  to  RIPS  components  in 
this  document.  Additional  KM  requirements  are  discussed  in  the  following 
section  in  terms  of  the  KM  logical  system  design. 


No.  Requirement 

1.  EA  will  require  ...  a powerful  KM 

facility  to  keep  track  of  the  many  data 
elements,  file  structures,  databases,  and 
flows  that  compose  the  knowledge  of  the 
corporation. 
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2.  The  EA  should  prepare  some  sort  of  model 
of  the  relationship  of  the  enterprise  to 
other  organizations,  including  the  volume, 
direction,  and  importance  of  various  data 
and  information  flows.  Similarly,  the  EA 
should  develop  a model  of  the  macroview 
of  the  enterprise  itself  . . . including 
tasking  requirements,  data,  and  inf or-  1. 

mation  flows.  2. 


P 33 

p 108  38 


3.  Selection  guidelines  will  need  to  be  es- 
tablished for  all  data  management  soft- 
ware. 


1.  p 34 

2.  p 109  43 


4.  Evaluation  procedures,  sample  databases, 
benchmark  tests,  and  checks  on  the  con- 
sistency of  a proposed  system  . . . will 
need  to  be  developed. 


1.  p 34 

2.  p 109  43 
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Requirement 

Source  & 
page  ref 

Discussion 
page  ref 

5. 

Production  of  useful  guidelines  for  prep- 
aration of  meaningful  impact  studies  is  an 
area  needing  further  research. 

1.  p 34 

2.  p 109 

43 

6. 

. . . costs  of  standardization  must  not  be 
transferred  to  database  end  users  by  re- 

quiring  them  all  to  view  data  In  exactly 
the  same  way.  . . v On  the  contrary,  the 
goal  of  standardization  should  he  to  fa- 
cilitate communication  among  different 
divisions  of  the  enterprise  by  agreeing 
on  standardized  concepts — not  standardized  1.  p 35 
names.  2.  p 109  38 

7.  ...  the  Knowledge  Resource  Center  (KRC) 

. . . should  contain  summary  information 
about  the  other  databases  of  the  enter- 
prise . . . driven  automatically  by  in- 
formation from  the  other  databases  .... 

It  should  provide  special  user  interfaces 


. . . for  handling  the  kinds 

of  questions 

1. 

p 

36 

that  top  management  asks. 

2 

p 

111 

38 

An  integral  part  of  the  KRC 

. . . is  a 

1. 

p 

37 

data  dictionary/directory. 

0 

p 

111 

38 

Tn  the  area  of  security,  the 

EA  must  . . . 

take  extraordinary  measures 

to  preserve 

the  integrity  and  privacy  of 

. . . meta- 

1. 

p 

38 

data. 

0 

p 

112 

38 

10.  There  is  great  need  for  tools  for  hard- 
ware/software tuning,  schema  design  .... 
models  of  significant  performance  vari- 
ables, ...  to  measure  actual  database 
use  and  compare  it  against  original  de- 
sign specif  ications  to  determine  when  re-  1.  p 39 


structuring  is  warranted. 

2 t 

P 

113 

43 

11. 

A dynamic  restructuring  capability  . . . 

1. 

P 

39 

can  be  a significant  performance  factor. 

2. 

P 

113 

49 

12. 

There  is  a need  to  develop  an  evaluation 
procedure  for  determining  the  suitability 
of  various  access  methods  for  different 

1. 

P 

39 

applications . 

2. 

P 

113 

43 
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No. 

Requirement 

page  ref 

page  ref 

13. 

In  the  security  areav  there  is  a great 
need  for  improved  security  techniques  in 

1. 

P 

39 

all  layers  of  hardware /sof tware . 

2. 

P 

114 

51 

14. 

Devise  a workable  and  agreed-upon  tech- 
nique for  locking  and  for  resolving 
deadlocks . 

1. 

P 

40 

50 

15. 

Rapid  recovery  from' failure  (is  required). 

1. 

P 

40 

2. 

P 

114 

33 

16. 

Adequate  audit-trail  capabilities  for 
back-up,  integrity  checking  routines  for 
maintenance,  and  restoration  tools  for 

1. 

P 

40 

recovery  need  to  be  developed. 

2. 

P 

114 

33 

17. 

A technique  for  automatically  checking  the 

1. 

P 

40 

semantic  consistency  of  data. 

2. 

P 

114 

38 

18. 

Tools  are  needed  to  improve  the  . . . 
process  ...  of  the  physical  mapping  of 

1. 

P 

40 

the  logical  database  to  physical  devices. 

2. 

P 

114 

43 

19. 

A methodology  is  needed  for  determining 
in  advance  the  expected  size  of  a data- 
base as  well  as  performance  character- 

1. 

P 

40 

lstics. 

2. 

P 

114 

43 

20. 

Models  are  necessary  to  allow  simulation 
of  the  effects  of  certain  parameter 

1. 

P 

40 

changes  on  the  performance  of  the  system. 

2. 

P 

114 

43 

21. 

Schema  navigation  tools  are  needed  to 
assist  the  DBA  in  purusing  and  altering 

1. 

P 

40 

existing  schemata. 

2. 

P 

114 

38 

22. 

Automated  procedures  for  migrating  exist- 
ing databases  to  new  hardware  or  software 

1. 

P 

40 

are  essential. 

2. 

P 

114 

38 

23. 

Techniques  are  needed  for  automatically 
generating  schemata  from  diagrams  or 

1. 

P 

40 

sample  programs. 

2. 

P 

114 

49 

24. 

A method  for  keeping  track  of  multiple 
versions  of  a schema  would  assist  in  main- 

taining  a database  whose  structure  changes 

1. 

P 

40 

dynamically . 

2. 

P 

114 

38 

1.  James  F.  Berry  and  Craig  M.  Cook:  Managing  Knowledge  as  a Corporate 
Resource.  Contract  Source  Document,  Version  4.5,  28  May  1976. 

2.  James  F.  Berry  and  Craig  M.  Cook:  Viewing  Knowledge  as  a Resource  in 
Federal  Departments  of  the  U.S.  Government.  Economic  Research  Service, 
U.S.  Department  of  Agriculture,  September  1977. 


30 


Source 

& 
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No. 

Requirement 

page  ref 

page  ref 

25. 

Develop  a procedure  for  determining  if 
the  internal  schema  as  designed  meets  the 

1. 

P 

40 

user's  or  AE's  requirements. 

2. 

P 

114 

43 

26. 

Techniques  for  easing  data  migration  or 
roll-over  of  application  from  one  system 

1. 

P 

41 

to  another. 

2. 

P 

115 

38 

27. 

Devise  an  effective  scheme  for  keeping 

1. 

P 

41 

copies  of  a database  is  synchrony. 

2. 

P 

115 

38 

28. 

Develop  ways  of  dynamically  allocating 
network  resources  (e.g.f  storage,  com- 
munication facilities,  data  management 

1. 

P 

41 

capabilities,  etc). 

2. 

P 

115 

48 

29. 

Develop  and  employ  subnetwork  models  with- 

1. 

P 

41 

in  a computer  network. 

2 

P 

115 

38 

30. 

Provide  local  and  global  views  of  a data- 

1. 

P 

41 

base  to  enhance  performance. 

2 

P 

115 

38 

31. 

Research  needs  to  be  done  on  how  to  allow 
the  user  to  make  an  easy  transition  from 

1. 

P 

42 

one  interface  to  another. 

2 

P 

115 

38 

32. 

Increase  the  effectiveness  of  data  pre- 
sentation . . . such  as  superimposing 
images  over  pictures  (e.g.,  slides  or 

1. 

P 

42 

television) . 

2 , 

P 

116 

38 

33. 

An  automatic  exception-reporting  capability 

in  which  an  alerter  is  triggered  when  cer- 
tain user-specified  conditions  occur. 

1. 

P 

42 

38 

14 . 

Methods  of  summarizing  data  from  numbers, 
graphs,  or  text,  and  presenting  summaries 

1. 

P 

42 

need  to  be  investigated. 

2. 

P 

116 

51 

35. 

End  users  and  AEs  have  a need  for  a navi- 
gation facility  which  will  allow  them  to 
browse  through  a database  with  an  unknown 

1. 

P 

42 

schema . 

2. 

P 

116 

38 

36. 

The  database  must  be  capable  of  instruct- 

1. 

P 

42 

ing  the  user  as  to  its  structure  and  use. 

P 

116 

38 

37. 

Techniques  are  needed  for  determining  the 

1. 

P 

42 

'optimal'  path  to  a data  item. 

2. 

P 

116 

38 
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Discussion 

No.  Requirement 

page  ref 

page  ref 

Methods  are  needed  for  automatically 

1. 

P 

42 

generating  the  necessary  access  code. 

2. 

P 

116 

38 

Preretrieval  query  analysis  and  simu- 
lation can  help  conserve  machine  as  well 

1. 

P 

42 

38 

as  human  resources. 

2. 

P 

116 

43 

The  techniques  of  .computer-aided  in- 
struction should  be  applied  to  the  task 
of  informing  the  user  how  to  best  use 

1. 

P 

42 

the  resources  available. 

2. 

P 

116 

38 

41.  Initial  validation  of  input  data  must 
occur  before  the  data  are  entered  into 
the  database  . . . using  methods  of 
error  detection  and  correction  .... 
Once  the  data  are  entered,  further  vali- 


dation  should  be  performed  as  an  in- 

1. 

P 

42 

tegrity  check. 

2. 

P 

116 

42. 

A scheme  is  needed  to  validate  derived 

1. 

P 

43 

data  or  the  algorithm  used. 

2. 

P 

116 

59 

43. 

Validation  techniques  are  needed  for 
testing  the  consistency  of  the  conceptual, 

1. 

P 

43 

internal,  and  external  schemata. 

2. 

P 

116 

43 

44. 

A method  of  associating  a validity  value 
with  each  data  element  and  with  databases 

in  general  needs  to  be  developed  so  that 
meaningful  validities  can  be  assigned  to 

1. 

P 

43 

information  derived  from  multiple  sources. 

2. 

P 

116 

45 

45. 

The  capability  to  check  context  integrity 

1. 

P 

43 

before  releasing  information  is  needed. 

2. 

P 

117 

38 

46. 

The  ability  to  validate  queries  before 

1. 

P 

43 

they  are  executed. 

2. 

P 

117 

47 

47. 

Efficient  techniques  for  handling  the 

1. 

P 

43 

entry  of  large  volumes  of  data  are  needed. 

2. 

P 

117 

43 

48. 

Standard  data  entry  techniques  would  re- 

1. 

P 

43 

duce  training  and  increase  efficiency. 

2. 

P 

117 

38 

49.  Tools  are  needed  to  model  the  real  world, 
to  develop  and  test  hypotheses  about  the 
real  world  based  on  the  data  available, 

, and  to  project  the  implications  of  a hy- 
pothesis about  the  real  world  through  some  1.  p 43 
simulation  mechanism.  2.  p 117  38 
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50.  Improvement  is  needed-  in  question  pre- 
sentation, interactive  question  enhance- 
ment, application  of  probablistic  rules 
(fuzzy  logic),  basic  inferencing  tech- 


niques, answer  presentation,  proof  demon- 
stration, and  inductive  inferencing  of 


rules  from  sample  d§ta  (with  application 

1. 

P 

44 

to  trend  analysis). 

2. 

P 

117 

45 

The  use  of  time  dependencies  on  queries 
in  order  to  keep  straight  the  accession 

1. 

P 

44 

of  archival  databases. 

2. 

P 

117 

44 

KM  LOGICAL  SYSTEM  DESIGN  CORRELATION  TO  RIPS 

In  this  section,  we  discuss  the  RIPS  concepts  as  related  to  the  KM 
logical  system  design  described  in  Reference  2.  The  requirements  of  each 
subsystem  of  the  KM  design  are  addressed,  along  with  the  requirements 
listed  in  the  preceding  section.  The  subsystems  proposed  by  KM  are  the 
Factual  Knowledge  Subsystem,  Procedural  Knowledge  Subsystem,  Judgment 
Support  Subsystem,  and  the  Translation  and  Control  Subsystem.  In  add- 
ition, we  include  discussion  of  a simulation  subsystem  and  its  relation- 
ship to  the  others  to  address  the  KM  requirements  for  performance 
analysis . 


Factual  Knowledge  Subsystem  (FKS) 

The  FKS  (Data  Management  Subsystem  in  Reference  1)  is  viewed  in  the 
KM  concept  as  the  DBMS  software  or  access  engines.  In  RIPS,  the  query- 
compiler  algorithm  can  traverse  a database  schema  specified  by  DIAM  de- 
scription and,  thus,  if  the  results  of  a query  compilation  were  machine- 
language  I/O  instructions,  the  query  compiler  would  serve  as  an  access 
engine.  However,  in  the  environment  intended,  the  results  of  a query 
compilation  are  translated  into  the  language  of  whatever  DBMS  or  access 
engine  is  implemented.  In  RIPS,  functions  required  of  the  KM  Data  Manage- 
ment Subsystem  are  performed  by  existing  DMSs  that  comprise  the  nodes  of 
the  information  system  network,  and  the  RIPS  query  compiler/translator  is 
functionally  a part  of  the  KM  Translation  and  Control  Subsystem's  logical 
design.  RIPS  operates  in  a distributed  heterogeneous  database  environ- 
ment and  imposes  no  requirements  on  existing  systems  to  be  brought  into 
compliance  with  some  arbitrary  standardized  implementation.  Conceptually, 

1.  James  F.  Berry  and  Craig  M.  Cook:  Managing  Knowledge  as  a Corporate 
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a RIPS  package  can  be  Installed  at  one  or  more  nodes  in  the  network  with- 
out changing  the  existing  implementation,  allowing  access  to  distributed 
data  and  processes  that  require  it  without  the  user's  regard  as  to  how  or 
where  they  are  implemented. 

Rapid  recovery  from  failure  (R-15)  is  primarily  a requirement  of  the 
DMS  installed  at  each  node  in  the  network,  as  are  database  restoration 
(R-16)  and  reorganization.  However,  the  RIPS  must  maintain  a transaction 
log  in  case  some  system  in  the  node  discovers  that  erroneous  data  may  have 
been  supplied  to  previous  qqeries.  In  such  cases,  the  RIPS  must  perform 
a corresponding  recovery  and  restoration  with  respect  to  the  queries  it 
has  originated. 

An  operating  system-supplied  file  structure  that  will  allow  users  to 
create  and  keep  their  own  personal  files,  which  may  not  be  part  of  the 
knowledge  resource,  is  required  by  KM.  While  RIPS  accommodates  this  cap- 
ability technically,  some  of  the  power  of  the  KM  concept  will  be  jeopard- 
ized. In  RIPS,  the  database  is  character ized  bv  a profile  of  the  data 
and  processing  requirements  via  the  quantitative  data  descriptions  in  the 
DD/D.  Because  the  QDD  is  the  one  source  of  information  regarding  perfor- 
mance analyses,  organizational  information  flows,  etc;  any  implementations 
not  recorded  will  not  be  included  in  the  analyses,  reducing  their  fidel- 
ity. If  the  QDP  profiles  are  included  in  the  DD/D,  the  corresponding  data 
automatically  become  part  of  the  knowledge  resource,  and  even  though  the 
data  may  not  require  active  management  of  the  EA,  the  data  will  require 
active  management  by  the  PBA.  However,  the  extent  to  which  QDD  profiles 
are  maintained  is  an  installation-dependent  decision,  and  whatever  fidel- 
ity is  Justified  can  be  accommodated . 

A text-editor  user  Interface  is  required  in  KM  and  is  discussed  under 
Text  Processing. 

Procedural  Knowledge  Subsystem  (PKS) 

The  PKS's  task  is  to  manage  the  procedural  knowledge  of  the  knowledge 
resource.  In  RIPS,  this  function  is  provided  by  representing  algorithms 
and  application  programs  of  an  existing  system  as  relations  in  the  IS, 
accessible  through  RIPL,  tailored  to  end  users  by  the  GEUF.  Details  of 
where  the  algorithms  are  located  and  how  they  are  executed  are  described 
in  the  data  directory  and  are  used  by  the  QC/T  in  formulating  programs 
for  execution  and,  at  the  same  time,  are  visible  for  management.  Func- 
tionally, the  resulting  program  doesn't  differ  from  a program  generated 
as  in  the  previous  paragraph,  and  thus,  the  program  is  executed  by  exist- 
ing DMS8  that  comprise  the  nodes  of  the  information  system  network,  and 
the  RIPS  QC/T  is  functionally  part  of  the  KM  Translation  and  Control  Sub- 
system's logical  design. 

The  heuristic  component  of  the  KM  PKS  is  described  as  existing  know- 
ledge-based systems.  Interfacing  with  such  systems  in  RIPS  should  pre- 
sent no  additional  problems.  At  the  information  level,  RIPS  does  not 
distinguish  between  the  originally  Intended  purpose  of  a resource  in  the 
network,  nor  its  method  of  implementation.  Rather  it  concentrates  on 
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information  available  from  the  node  that  is  to  be  made  accessible  to 
other  nodes.  The  knowledge-based  systems  listed  in  Reference  2 have 
one  thing  in  common — a database  and  products  generated  from  the  appli- 
cation of  rules  and  algorithms  (which  are  also  data  in  some  context) 
over  the  database.  Either  the  final  products  themselves  or  some  por- 
tion of  the  contents  of  the  database  used  in  these  productions,  or  both, 
need  to  be  accessed  by  other  nodes.  Otherwise,  there  is  no  reason  to 
include  such  systems  in  the  network.  Whatever  information  is  to  be  made 
available  is  represented  in  the  RIPS  information  structure.  For  example, 
if  only  the  products  of  an  existing  system  are  to  be  included,  a single 
relation  may  suffice,  as  perhaps 

PRODUCT  (PROD-ID,  TIME,  ...) 

and  suitable  partially  predefined  query(s)  and  external  interfaces  can  be 
tailored  to  the  using  environment.  Of  course,  details  of  where  the  ap- 
plication is  implemented,  how  it  is  executed,  and  the  translation  descrip- 
tions are  entered  in  the  DD/D. 

If  only  part  of  the  data  used  in  the  production  is  to  be  made  acces- 
sible, then  only  a description  fo  the  data  need  be  included  in  the  In- 
formation Structure,  for  example 

TABLE-NAME  (ID,  Vj.V  , . ..) 

and  the  corresponding  supporting  descriptions  entered  in  the  DD/D. 

However,  the  real  justification  of  the  KM  concept  is  the  more  diffi- 
cult case  in  which  data  or  applications  from  one  node  are  needed  in  con- 
junction with  data  or  applications  from  others  to  satisfy  a single  query. 

It  is  this  environment  that  RIPS  envisions. 

A basic  precept  in  the  RIPS  philosophy  is  that  the  semantics  of  func- 
tions must  be  separated  from  the  implementation  details  to  make  the  know- 
ledge widely  available  and  manageable.  This  includes  stored  data,  algo- 
rithms, application  programs,  or  entire  systems.  The  degree  to  which  an 
organization's  knowledge  is  to  be  made  available  is  properlv  the  subject 
of  organizational  management,  and  whatever  choice  is  made  must  be  sup- 
ported technically  by  RIPS. 

Judgement  Support  Subsystem  (JSS) 

The  KM  JSS  (User  Interface  Subsystem  in  Reference  1)  logical  system 
design  requires  flexible  user-query-statement  and  data  display  techniques. 
The  flexibility  is  provided  in  RIPS  by  the  Generalized  End-User  Facility, 
which  directs  the  mapping  of  external  representations  to  queries  in  terms 

2.  James  F.  Berry  and  Craig  M.  Cook:  Viewing  Knowledge  as  a Re  sour o<  in 
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of  the  information  structure  and  mapping  of  the  results  of  queries  to  ex- 
ternal representations  or  displays.  Thus  major  functions  required  in  the 
KM  logical  design  are  allocated  to  the  KM  Translation  and  Control  Sub- 
system discussed  in  the  next  section. 

In  RIPS,  there  is  a high  degree  of  symmetry  between  the  functions  of 
database  and  end-user  device  management.  By  viewing  the  end-user  device 
as  a database,  data  stored' in  it  at  any  particular  moment  can  be  described 
in  the  same  terms  (DIAM  descriptions)  as  data  stored  in  the  internal  data- 
base. Because  the  process  of  reading  from,  or  writing  to,  user's  devices 
is  essentially  the  same  as  reading  from  or  writing  to  database  storage  de- 
vices, the  algorithm  that  traverses  the  DIAM  descriptions  of  data  storage 
implementations  can  also  traverse  the  DIAM  descriptions  of  display  for- 
mats. In  the  RIPS  view,  a user's  retrieval  query  from  the  internal  data- 
base is  simultaneously  an  add  query  to  the  external  database  or  user's 
device.  This  concept,  discussed  in  Reference  18,  is  summarized  below. 

When  the  query  compiler  receives  a retrieval  query,  it  must  determine 
how  and  where  the  required  data  are  stored.  This  information  is  provided 
by  DIAM  descriptions  stored  in  the  data  directory.  The  query  compiler 
uses  these  descriptions  to  formulate  a program  to  retrieve  the  data.  The 
program  is  translated  into  the  language  required  by  the  DMS  that  controls 
the  database(s),  and  the  data  are  returned  to  the  compiler,  which  then 
assembles  the  data  into  a temporary  storage.  Because  the  result  of  each 
RIPL  statement  is  a relation,  the  temporary  storage  contains  the  relations 
derived  by  the  query.  Thus,  a RIPL  query  contains,  implicitly,  the  rela- 
tional or  conceptual  view  of  the  results. 

Now  the  reverse  process  must  take  place.  Relations  generated  by  the 
query  must  be  added  to  the  external  database.  The  GEUF  automatically 
generates  the  ADD  query  and  submits  it  to  the  query  compiler.  When  the 
compiler  receives  an  add  query,  it  must  determine  how  and  where  the  re- 
quired data  are  stored.  This  information  is  the  formatting  specifications 
supplied  by  the  user  as  DIAM  descriptions  and  stored  in  the  data  directory. 
The  query  compiler  uses  these  descriptions  to  formulate  a program  to  store 
(display)  the  data  accordingly.  The  program  is  translated  into  the  language 
required  by  the  device  driver  or  operating  system  that  controls  the  device. 

In  RIPS,  this  concept  is  extended  to  map  user-supplied  data  via  what- 
ever interfacing  technique  is  desired  (forms,  menus,  light  pen,  etc)  to 
partially  predefined  RIPL  queries. 

The  symmetric  view  of  the  internal  and  external  mappings  allows  recur- 
sive use  of  the  query  compiler/translator  software  and  thus  provides  end- 
user  interface  independence  at  the  external  level  just  as  it  provides  data- 
base implementation  independence  at  the  internal  level. 
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The  requirement  to  provide  a set  of  common  utilities,  such  as  sorting, 
report  generation,  etc,  accessible  by  the  user  is  implicit  in  the  RITS  con- 
cept, and  makes  the  use  of  such  utilities  clear  to  users  by  requiring  only 
that  the  format  of  displays  be  described — not  the  procedures  for  how  the 
formats  are  to  be  generated. 

The  KM  requirement  for  a "universal  interface,"  providing  the  user 
with  a common  language  for ' selecting  various  interfaces,  is  described  in 
the  next  section,  as  are  the  eleven  interfacing  techniques  specified  by 
the  KM  logical  system  design*. 

In  addition,  a Knowledge-Based  Personal  Assistant  (KBPA)  is  required 
to  aid  users  by  providing  substantial  knowledge  of  what  a particular  user 
needs  to  do  his  or  her  job.  In  RIPS,  this  requirement  is  satisfied  by  the 
GEUF  and  the  concepts  of  derivations,  user  views,  and  partially  predefined 
queries,  as  discussed  below. 

In  an  organizational  environment,  individual  users  of  the  infor- 
mation system  are  not  free  to  perform  operations  over  the  information  at 
will.  Their  use  of  the  system  is  constrained  by  the  purpose  of  their  job, 
just  as  their  use  of  physical  resources  is.  However,  the  use  of  auto- 
mated information  has  added,  or  at  least  changed,  tasks  that  are  necessary 
for  their  job;  that  is,  they  must  query  or  update  databases.  The  degree 
to  which  this  constrained  interaction  corresponds  semantically  with  their 
job  largely  determines  the  success  of  the  system. 

For  example,  a motel  reservation  clerk  in  New  York,  when  asked  to  re- 
serve a room  at  the  Downtown  Atlanta  Motel,  performs  the  task  by  making 
an  entry  via  a terminal.  The  transaction  is  stated  in  terms  of  reserving 
a room.  It  is  not  viewed  as  updating  the  database,  even  though  that  is 
precisely  what  is  being  done  and  may,  in  addition,  require  computational 
or  other  algorithms  (i.e.,  internal  knowledge)  to  determine  whether  a room 
is  available,  of  which  the  clerk  is  totally  unaware. 

Well-designed  user  languages  make  extensive  use  of  verbs  in  the  ver- 
nacular of  the  user  community  for  the  dual  purpose  of  aligning  the  seman- 
tics with  the  job  and  constraining  the  operations  to  just  those  required. 
They  do  this  by  providing  application  programs  that  recognize  only  these 
functions,  and  translating  the  requests  into  database  and  algorithmic  op- 
erations. Thus,  while  the  motel  clerk  may  reserve  a room  in  Atlanta,  he 
cannot  assign  a particular  room,  add  new  rooms  to  the  motel,  or  determine 
who  is  in  a particular  room.  But  these  operations  are  meaningful  and  re- 
quired for  some  users  in  the  network. 

In  RTFS,  both  the  semantics  and  the  constraints  are  implemented  through 
the  use  of  partially  predefined  queries,  derivations,  and  user  views. 
Whatever  knowledge  or  rules  are  to  be  applied  are  specified  as  derivations 
(for  general  use)  or  in  the  partially  predefined  queries  (for  particular 
use).  Because  they  are  stored  in  the  DD/D,  their  use  is  controlled 
through  specification  of  authorization  constraints.  The  interface,  or 
user's  language,  to  these  partially  predefined  queries  is  tailored  by  the 
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declaration  of  formats  to  require  only  user-supplied  values  to  particular- 
ize the  query  for  the  current  task.  This  includes  whatever  external  repre- 
sentations are  called  for  in  the  vernacular  and  in  the  users'  environments. 

Today,  application  programs  can  provide  whatever  interface  and  con- 
straints are  needed,  incorporating  the  knowledge  to  assist  the  user.  But 
they  are  bound  to  the  current  database  implementation;  new  information 
requirements  are  difficult- to  accommodate  even  though  the  necessary  data 
are  already  available;  and  the  internal  knowledge  is  not  readily  accessible 
to  management.  In  RIPS,  the  partially  predefined  queries  are  not  bound  to 
the  implementation,  either  external  or  internal;  new  requirements  are  easi- 
ly defined  as  updates  to  the  DD/D  via  RIPL  (or  tailored  forms);  and  the  in- 
ternal knowledge  is  readily  accessible  through  RIPL  queries  to  the  DD/D. 

Again,  this  flexibility  is  in  keeping  with  the  RIPS  philosophy  stated 
under  Procedural  Knowledge  Subsystem  (PKS) . 

Translation  and  Control  Subsystem  (TCS) 

Requirements  in  the  logical  system  design  of  the  Translation  and  Con- 
trol Subsystem  are  allocated  to  RIPS  components  as  follows.  The  canonical 
form  for  data  structures  and  data  formats  is  satisfied  by  the  RIFS  infor- 
mation structure  and  DIAM  descriptions  of  implementations  stored  in  rhe 
DD/D.  The  mapping  mechanism  is  provided  by  the  query  compiler /translator 
and  the  RIPL  preprocessor,  using  the  specifications  stored  in  the  DD/D. 

The  enterprise  knowledge  resource  is  satisfied  by  recursive  use  of  RIPS 
concepts  in  the  management  of  the  DD/D  and  distributed  data.  Tailored 
user  interface  techniques  are  provided  by  the  GEUF,  including  the  RIPL 
language,  RIPL  preprocessor,  and  recursive  use  of  the  QC/T.  In  the  fol- 
lowing paragraphs,  some  details  are  provided  for  each  requirement , in- 
cluding those  mentioned  in  the  Introduction. 

The  conceptual  model  of  the  enterprise's  data  is  provided  by  the  RIPS 
information  structure.  The  canonic  model  employed  in  RIPS  provides  a %’iew 
of  stored  data,  including  data  stored  in  the  DD/D,  and  stored  algorithms. 

In  conjunction  with  the  RIPL  language,  the  information  structure  satisfies 
R-49  by  providing  a sufficient  model  of  the  real  world  over  which  hypoth- 
eses can  be  developed.  In  this  regard,  an  important  RIPS  concept  is  in- 
clusion of  QDD  parameters  in  the  information  model.  This  provides  visi- 
bility of  the  populations;  population  distribution;  and  arrival,  change, 
and  departure  rates  of  entities  in  the  real  world  of  interest.  This  model 
of  the  dynamics  of  the  real  world  is  valuable  knowledge,  not  only  from  the 
standpoint  of  specific  production  information  processing  (e.g.,  at  what 
rate  do  competitors  enter  and  leave  the  fields?)  but  also  in  the  imple- 
mentation decisions  owing  to  the  dynamics.  The  adequacy  and  fidelity  of 
the  model  and  hypotheses  can  be  tested  by  periodically  performing  real- 
world  experiments  and  comparing  them  with  the  current  information  model 
and  QDD  statistics. 

The  information  structure  is  the  foundation  for  satisfying  R-26  be- 
cause both  stored  data  and  algorithms  are  represented,  and  RIPL  queries 
remain  stable  under  the  migration  of  either. 


The  requirements  for  data  translation  (R-22,  R-26)  are  satisfied 
through  recursive  use  of  the  query  compiler  and  GEUF,  which  together 
form  a data  translator.  In  normal  use,  a user's  retrieval  query  is 
simultaneously  an  add  query  in  which  the  results  are  added  to  an  exter- 
nal database  (display)  according  to  DIAM  descriptions  of  the  format. 

This  symmetry  allows  the  external  database  to  be  another  internal  data- 
base because  DIAM  descriptions  and  compiler's  operations  are  independent 
of  the  device.  Only  the  translator  is  aware  of  any  difference,  and  thus, 
translation  of  one  internal  database  to  another  requires  only  DIAM  de- 
scriptions of  both,  and  applicable  RIPL  queries  to  retrieve  whatever 
portion  of  the  source  database  is  desired  with  the  correlation  of  what- 
ever target  database  is  required. 

The  RIPS  Information  structure  includes  the  three-part  association  of 
all  entity  name  sets,  or  in  relational  terms,  the  relation/name /domain- 
name  /at  tribute  (role)  name  association.  RIPL  queries  including  adds  and 
changes  over  the  information  structure  can  be  automatically  checked  for 
semantic  consistency  (R-17)  to  the  extent  that  the  domain-name/role-name 
compatibility  can  be  assured.  For  example,  a query  that  compares  the 
role's  age  with  street  number  may  be  numerically  legal  (i.e.,  both  are 
integer  numbers),  but  is  semantically  questionable  because  they  are  from 
different  domains.  Further  checks  on  context  integrity  (R-45)  are  pro- 
vided by  the  QC/T  query  decomposition  process.  Queries  that  cannot  be 
decomposed  in  terms  of  the  information  structure  are  ambiguous. 

The  RIPS  DD/D  (R-8)  contains  all  the  specifications  required  by  the 
GEUF  and  the  query  compiler/translator , including  the  information  struc- 
ture, which  in  turn  includes  the  information  structure  of  the  specifica- 
tions themselves  as  part  of  the  database.  R-l  requirements  are  satisfied 
by  providing  access  and  management  of  the  "data  elements,  file  structures, 
databases,  and  flows  that  comprise  the  knowledge  of  the  corporation." 
General  requirements  for  metadata  management  are  provided  by  the  GEUF, 
tailoring  the  interface  to  whatever  technique  is  called  for,  and  provid- 
ing access  to  either  metadata  or  stored  data  in  a consistent  manner  (R-6). 

The  integrity  and  privacy  of  metadata  (R-9)  are  provided  in  the  same 
manner  and  to  the  same  degree  as  for  stored  data.  Access  to  the  data  dic- 
tionary provides  the  tools  for  the  DBA  to  peruse  and  alter  existing 
schemata  (R-21) , and  again,  tailored  user-oriented  interfaces  are  pro- 
vided in  keeping  with  the  DBA's  skills  and  needs.  The  data  directory — 
describing  how  and  where  the  data  are  stored — provides  a view  of  subnet- 
works within  the  computer  network  as  required  by  R-29.  Local  views  of 
the  database  (R-30)  are  implemented  by  declaring  user  views  and  partially 
predefined  queries. 

The  QDD  provides  both  a macroview  of  the  enterprise  (R-2)  and  a micro- 
view (R-29)  characterizing  the  data  flow  for  individual  users.  A part  of 
the  DD/D,  the  QDD  can  be  maintained  to  whatever  concurrency  provides  the 
fidelity  required  through  the  use  of  'trigger'  queries  that  update  the  QDD 
based  on  receipt  and  execution  of  other  queries. 


Mappings  to  internal  implementations  are  performed  in  RIPS  by  the  RIPL 
query  compiler/translator,  operating  over  DIAM  descriptions  of  implementa- 
tions. DIAM  descriptions,  stored  in  the  DD/D,  also  provide  the  DBA  with 
knowledge  and  visibility  of  the  various  implementations  (R-21,  R-24)  and, 
in  conjunction  with  the  QDD,  provide  the  compiler  with  accessing  and  quan- 
tification parameters  needed  for  optimization  (R-37,  R-39) . The  compiler/ 
translator  automatically  generates  the  necessary  access  code  (R-38)  for 
the  applicable  DMS. 

Where  data  are  redundantly  stored  in  the  network,  the  query  compiler's 
optimizer  determines  which  access  paths  to  use  for  efficient  resource  use. 
For  additions,  deletions,  and  changes,  all  instances  are  maintained,  thus 
satisfying  R-27. 

The  RIPL  allows  users  to  state  what  information  is  required — not  how 
the  data  are  obtained.  Queries  remain  stable  regardless  of  how  or  where 
in  the  network  the  required  data  are  implemented,  and  changes  to  imple- 
mentations have  no  effect  on  the  queries.  Queries  may  include  requescs 
for  both  derived  data  and  stored  data  in  a consistent  manner  so  that  a 
change  that  replaces  stored  data  with  an  algorithm  for  deriving  the  same 
data,  or  vice  versa,  has  no  effect  on  queries. 

This  implementation  independence  is  provided  by  the  RIPL  language  and 
the  RIPL  query  compi ler/translator  that  automatically  generates  the  re- 
quired accessing  programs  using  the  implementation  specifications  (DIAM 
descriptions).  Implementation  descriptions  are  not  ratricted  to  auto- 
mated data,  but  extend  to  descriptions  of  manually  stored  data  sources. 
This  concept  provides  a consistent  methodology  for  the  KM  concept  because 
management  of  corporate  information  requires  knowledge  of  the  formal  lines 
of  communication  in  the  organization,  whether  automated  or  not.  A query 
that  requires  both  computer-stored  data  and  manually  stored  data  can  only 
be  answered  by  accessing  both,  and  the  methods  of  accessing  the  required 
data  must  necessarily  differ,  but  the  methods  of  stating  the  information 
requirement  need  not. 

Techniques  for  accessing  manually  stored  data  are  dictated  by  the  use 
of  the  data.  If  no  computations,  correlations  to  computer-stored  data, 
or  special  report  formats  are  needed,  the  response  to  a query  could  con- 
tain only  a description  of  where  the  manually  stored  parts  of  the  query 
can  be  found  (e.g. , office  number,  file  name,  person,  etc).  Otherwise, 
manually  stored  data  comprising  the  answer  must  be  entered  in  the  com- 
puter to  produce  the  final  product.  This  can  be  accomplished  to  whatever 
degree  of  automation  is  demanded,  including  automatically  issuing  a re- 
quest to  the  manual  data  manager  (e.g.,  clerk,  librarian)  and  accepting 
the  answer  via  a terminal  input,  then  completing  the  processing.  If 
manually  stored  data  are  subsequently  automated,  there  is  no  effect  on 
users — queries  remain  stable  and  only  response  time  may  differ. 


40 


GEUF  concepts  accommodate  all  types  of  queries,  as  described  in  Ref- 
erence 18  and  summarized  below.  Ad  hoc  queries  are  simply  stated  in  RTPL 
at  the  time  desired.  Totally  definable  queries  that  are  to  be  executed 
at  the  occurrence  of  some  predetermined  event — such  as  real-time  queries — 
are  predefined  and  stored  in  the  DD/D,  along  with  a description  of  the 
initiating  stimulus.  The  GEUF  monitors  all  events  that  serve  as  stimuli, 
and  upon  receipt,  retrieves  the  corresponding  predefined  query  and  submits 
it  to  the  query  compiler  to  be  executed.  As  part  of  the  database,  the 
DD/D  can  be  implemented  by  whatever  means  are  required.  Retrieval  of  a 
predefined  query  from  the  DD/D  may  involve  only  a main  memory  access  if 
that  is  where  the  query  is  stored. 

Queries  that  are  partially  predetermined — either  the  stimulus  or  some 
portion  of  the  query  is  to  be  supplied  by  the  user  at  execution  time — 
are  predefined  and  stored  in  the  DD/D.  If  the  stimulus  (e.g.,  function 
key,  etc)  is  to  be  supplied,  its  description  is  stored  in  the  DD/D  along 
with  the  predefined  query  and  the  correlation  between  them.  When  the 
GEUF  receives  the  stimulus,  it  retrieves  the  corresponding  query  and  sub- 
mits it  to  the  query  compiler  for  execution. 

If  a portion  of  the  query  is  to  be  supplied  by  the  user  at  execution 
time  to  particularize  the  context  for  current  needs,  the  stimulus  is  the 
receipt  of  the  user-supplied  data  by  whatever  means  are  appropriate  (e.g., 
forms,  including  menus,  terse  command  language,  etc).  The  predefined  part 
(partially  predefined  query)  of  the  query  is  stored  as  relations  in  the 
DD/D,  and  the  description  of  the  external  representations  is  stored  as 
DIAft  descriptions  in  the  DD/D.  The  GEUF  views  user-supplied  representa- 
tions as  update  or  change  queries  to  the  relations  containing  the  par- 
tially predefined  query  and  initiates  the  change  by  issuing  updating 
queries  to  the  query  compiler/translator , which  in  turn  performs  the  up- 
date. The  completed  query  is  then  submitted  to  the  compiler/translator 
for  execution. 

Tlie  technique  !>y  which  the  user  interfaces  with  partially  predefined 
queries  includes  all  techniques  required  in  the  KM  logical  system  design 
of  the  user  interface  subsystem.  Specifically,  the  eleven  required  inter- 
faces are  accommodated  as  follows.  A subset  of  the  natural  language  is 
accommodated  through  the  use  of  RIPL  ad  hoc  statements  using  the  concepts 
of  derivations  and  user  views.  A graphic  representation  is  definable  in 
DIAM  descriptions  for  either  input  (via  light  pen,  etc)  or  display.  A 
forms-oriented  interface  is  provided  by  DTAM  descriptions  of  whatever 
geometry  is  desired  interacting  with  partially  predefined  queries.  Menus 
are  special  cases  of  forms,  and  entry  of  a 'selection  mode'  is  simply  the 
representation  of  either  data  to  complete  a partially  predefined  query  or 
the  stimulus  to  execute  a predefined  query.  Dialogue  is  provided  hy  re- 
peated use  of  partially  predefined  queries  in  which  receipt  of  a stimulus 
executes  a query  that  retrieves  a form  or  prompting  message,  and  the 
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subsequent  receipt  of  the  form  is  the  stimulus  to  execute  another  prede- 
fined query,  complete  and  execute  a partially  predefined  query,  or  both. 

A transaction-oriented  interface  is  a special  case  of  forms  in  which  the 
geometry  of  the  form  is  a command-like  sentence. 

A relational  interface  is  provided  either  by  RIPL,  or  a f orms-orient- 
ed  approach  like  Query  by  Example13  is  provided  by  the  forms  interface  to 
partially  predefined  queries  so  that  the  partially  predefined  queries  in- 
clude only  the  relation  name  (or  range)  of  a statement,  and  all  other 
parts  of  the  query  (e.g.,  attribute  list,  function  code,  qualifying  pred- 
icates, etc)  are  user-supplied  at  execution  time.  Access  of  computational 
processes,  required  by  the  KM  programmatic  interface,  is  provided  by  in- 
clusion of  relational  descriptions  of  algorithms  in  the  RIPS  information 
structure  and  the  ability  of  the  compiler/translator  to  execute  the  al- 
gorithms. The  navigational  interface  is  provided  by  access  to  either  the 
database  or  the  DD/D  by  the  RIPS  concept  of  viewing  the  DD/D  as  part  of 
the  database  and  descriptions  of  DD/D  contents  as  part  of  the  information 
structure,  thereby  allowing  use  of  all  RIPS  capabilities  in  the  KM  environ- 
ment (R-35) . Text  editing  is  discussed  under  Text  Processing. 

Transition  of  one  interface  to  another  (R-ll'l  is  accomplished  bv  al- 
tering DIAM  descriptions  of  user-supplied  representations , changing  the 
stimulus  definition  of  predefined  queries,  or  changing  device  driver 
specifications  for  the  query  compiler/translator . All  these,  including 
combinations,  preserve  the  semantics  of  the  underlying  query.  The  ease 
with  which  this  can  be  accomplished  is  apparent  from  a technical  stand- 
point— the  DD/l)  must  be  updated  accordingly.  However,  from  the  user's 
standpoint,  the  ease  of  changing  from  one  interface  to  another  is  properly 
the  subject  of  human  factors,  including  training,  environment,  expertise, 
aptitude,  etc.  However,  the  RIPS  concept  separates  the  semantics  from  the 
implementation  and  includes  a wide  range  of  alternatives  with  relatively 
small  programming  effort,  allowing  these  decisions  to  be  made  quickly  and 
effectively.  Thus,  RIPS  agrees  with  the  KM  assertion  that  a single  user 
interface  is  incapable  of  satisfying  user-community  needs  and  provides 
ease  of  migration  from  one  to  another,  technically,  leaving  the  choice  of 
techniques  to  the  human  factors  discipline  (R-32,  R-3A) . 

The  concept  of  partially  predefined  queries  satisfies  the  alerting  or 
trigger  query  functions  required  by  R-33,  and  allows  flexible  composition 
of  whatever  dialogue  or  CAI  is  required  by  R-36  and  R-40.  Standard  data 
entry  techniques  can  be  tailored  for  applicable  users  to  reduce  training 
and  increase  efficiency,  satisfying  R-48  and  the  external  considerations 
of  R-47. 


13. 


M.  Stonebraker,  E.  Wong,  and  P.  Kreps : "The  Design  and  Implementation 
of  INGRES,"  ACM  Transactions  on  Database  Systems,  Vol  I,  No.  3,  Sep- 
tember 1976,  pp  189-222. 
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Simulation  Subsystem 


Performance  analysis,  evaluation,  and  DMS  software  selection  are  in- 
tegral parts  of  RIPS.  While  the  DMS  simulator  is  not  allocated  to  any 
of  the  three  KM  subsystems,  it  does  use  many  RIPS  components  that  are. 
Specifically,  the  information  structure,  QDD,  DIAM  descriptions,  and  major 
functions  of  the  query  compiler  are  all  part  of  the  simulator. 


In  the  math-model  simulation  mode,  the  information-based  model  gener- 
ates simulated  time-tagged  queries  that  statistically  represent  the  work- 
load described  in  the  QDD  of  the  application  under  study.  In  the  KM  en- 
vironment, if  the  purpose  of  a performance  analysis  is  to  evaluate  im- 
plementations of  the  existing  workload,  the  current  information  structure 
and  QDD  consitute  the  workload  and  thus  become  the  input  to  the  informa- 
tion-based model  of  the  simulator,  and  the  candidate  implementations  are 
described  in  the  lower-level  models  of  the  simulator.  If  the  purpose  is 
to  evaluate  the  effect  of  changing  workloads  on  the  existing  implementa- 
tion, the  modified  QDD  (to  reflect  the  new  workload)  becomes  the  input  to 
the  information-based  model,  and  current  implementation  descriptions 
(DIAM  descriptions)  form  the  input  to  the  lower  levels.  If  the  purpose 
is  to  determine  the  effect  of  a change  to  the  information  structure,  then 
both  the  modified  QDD  and  candidate  implementation  descriptions  are  input 
to  the  simulator.  If  the  purpose  is  to  evaluate  a different  host  computer, 
for  the  existing  implementation,  the  host  computer  model  of  the  simulator 
must  be  changed  accordingly. 


In  the  real-time  mode,  QDD  static  descriptions  are  used  to  generate 
statistically  significant  symbolic  instances  that  the  compiler/trans- 
lator  translates  into  actual  update  instructions  to  the  DBMS  under  study. 
This  produces  a baseline  sample  database  to  provide  real-time  performance 
measures.  The  time-tagged  transactions  or  queries  that  represent  the 
workload  are  generated  by  a query  generation  module  according  to  the  QDD 
parameters.  These  are  similarly  translated  into  the  DM1.  with  compatible 
symbolic  values  in  the  qualifiers  and  submitted  to  the  actual  DBMS  under 
control  of  the  simulator.  Performance  measures  are  accumulated  by  a com- 
mercial performance  measurement  program  installed  in  the  host  computer. 


These  capabilities  satisfy  R-4  by  providing  evaluation  techniques, 
sample  databases,  benchmarks,  and  checks  on  the  consistency  of  a proposed 
implementation.  R-10  is  satisfied  by  allowing  modeling  of  significant 
performance  variables  for  implementation  tuning  and  for  comparing  design 
specifications  with  actual  use.  Simulation  of  the  performance  of  candi- 
date access  methods  for  the  specific  application  described  by  QDD  satis- 
fies R-12,  R-18,  and  R-20,  and  produces  an  accurate  measure  of  database 
size  owing  to  the  data  and  access  path  overhead  before  implementation, 
satisfying  R-19  and  R-25.  R-39  is  satisfied  in  the  simulation  mode  be- 

fore implementation,  and  by  the  query  optimizer  during  operations. 


Validation  of  conceptual  and  internal  consistency  (R-43)  is  provided 
by  the  simulators.  Validation  of  external  schemata  has  not  been  specifi- 
cally addressed  in  RIPS.  R-46  is  satisfied  through  simulation  before  im- 
plementation, and  by  the  RIPL  preprocessor  of  the  GEUF  and  the  RTPt, 
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compiler  during  operations.  The  requirements  of  R-47  are  satisfied  to  the 
extent  that  candidate  solutions  to  the  entry  of  large  volumes  of  data  can 
be  evaluated  by  simulation. 


In  addition  to  discrete-event  simulation,  RIPS  has  addressed  the 
assessment  of  inherently  unquantif iable  characteristics  of  candidate 
DBMS  software  in  the  evaluation  and  selection  process.  Among  these  are 
such  characteristics  as  vendor  support,  ease  of  use,  reliability,  con- 
formance to  standards,  and  others.  The  basis  of  this  methodology  is  rec- 
ognition that  any  two  candidate  software  packages  can  be  made  functionally 
equivalent  in  virtually  every  respect  through  additional  programming  and/ 
or  contracted  services.  The  cost  or  time  required  to  provide  this  addi- 
tional effort  for  each  characteristic  is  the  key  parameter  in  the  decision 
process.  These  extensions  to  DBMS  simulation  performance  analysis  capa- 
bilities provide  a complete  database  system  design  and  selection  method- 
ology, satisfying  R-3,  R-4 , R-5,  and  R-12. 


KM  EXTENSIONS  TO  RIPS 

KM  concepts  require  capabilities  that  have  been  considered  for  RIPS, 
but  the  method  of  their  implementation — or  whether  they  will  be  incorpo- 
rated at  all — has  not  been  determined.  These  include  time  dependencies 
(R-51) , application  of  probablistic  rules  (R-50) , context  integrity  (R-45), 
special  data  presentation  techniques  (R-32,  R-34)  , dynamic  network  re- 
source allocation  (R-28) , dynamic  data  structure  change  (R-ll,  R-24), 
automatic  schema  generation  (R-23) , concurrency  resolution  (R-14),  im- 
proved security  techniques  (R-13),  and  text  processing.  The  potential 
for  incorporating  these  requirements  in  RIPS  is  discussed  below. 

Time  Dependencies 

All  data  relations  are  time-dependent;  but  because  the  majority  of 
user’s  needs  pertain  to  current  data,  time-dependent  specifications  are 
implemented  by  existing  systems  for  only  those  data  that  the  user  antici- 
pates will  require  frequent  retrieval  for  specific  times.  The  relations 

EMP (E#, NAME, MARITAL-STATUS, ADDRESS, . . . ) 

SAL-HIST  (E//,  DATE,  SAL) 

recognize  that  an  employee's  salary  changes,  and  implements  the  time- 
dependent  knowledge  of  such  changes.  Of  course,  the  employee's  marital 
status,  address,  and  even  name  are  also  subject  to  change  with  time,  but 
ready  access  to  this  specific  knowledge  is  not  anticipated,  and  its  re- 
tention is  usually  in  the  form  of  archived  databases.  Within  the  period 
of  a stable  information  structure,  say  time  Tj  to  T4 , there  may  be  one 
implementation  (e.g.,  checkpoints,  archival  files)  for  Tj  and  T3  and  an- 
other (e.g.,  on-line  database)  for  Ti, . A query  to  find  an  employee’s  ad- 
dress at  time  T2  can  be  answered  by  applying  the  transaction  log  covering 
Tj  to  T2  against  the  Tj  archive.  Thus,  the  transaction  log  represents  the 
database  change  from  Tj  to  T4,  and  time  dependencies  require  that  we  view 
the  transaction  log  as  a natural  part  of  the  database. 
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The  KM  concept  would  make  such  queries  possible,  and  the  processing 
would  be  clear  to  the  user.  RIPS  already  envisions  the  potential  for  re- 
lations being  Implemented  by  multiple  schemata,  including  the  transaction 
log,  and  in  this  case,  instances  of  EMP  relation  attributes  are  distrib- 
uted by  restrictions  on  the  (unstored)  time  domain,  just  as  tuples  in 
SAL-HIST  could  conceivably  be  physically  distributed  by  restrictions  on 
the  (stored)  date  domain. 

A unified  view  of  the  time  dependencies  of  relations  would  recognize 
that  all  relations  have  a time  domain,  and  the  single  relation 

EMP (El , NAME , SAL , MARITAL-STATUS , ADDRESS .TIME) 

would  suffice  as  the  information  structure  for  any  time-dependent  query, 
including  salary  history,  marital-status  history,  etc,  and  query  language 
need  not  change  to  accommodate  specification  of  time  dependencies.  How- 
ever, this  extension  requires  modification  of  current  concepts  of  tuple 
identifiers,  derivations,  user  views,  and  other  components  of  RIPS,  and 
it  is  not  clear  what  the  effect  of  such  modifications  might  he. 

Application  of  Probablistic  Rules 

Some  types  of  probablistic  rules  (fuzzy  logic)  can  easily  be  handled 
by  existing  RIPS  concepts.  In  Reference  21,  Figure  9 is  presented  as  an 
example  of  fuzzy  knowledge.  In  this  work,  the  authors  present  a system, 
Fuzzy-Set-Theoretic  Data  Structure  (FSTDS) , for  implementing  such  know- 
ledge so  that  queries  like  "what  does  a bat  belong  to?"  can  be  answered. 


Animal 


The  edges  of  the  graph  in  the  figure  represent  a compatibility  factor 
for  the  association  between  the  nodes  that  has  meaning  to  users  of  such 
knowledge.  Thus,  bird  is  associated  with  anaimal  with  a compatibility 
factor  "middle,"  whatever  that  means. 


In  the  RIPS  information  structure,  semantic  concepts  are  made  explicit 
thus,  a relational  view  of  the  knowledge  would  recognize  the  concept  of 


21.  Masaharu  Mizumoto,  Motohide  llmano,  and  Koklchi  Tanaka:  "Implementa- 
tion of  a Fuzzy-Set-Theoretic  Data  Structure  System,"  Presented  at 
Third  International  Conference  on  Very  Large  Databases,  Tokyo, 

Japan,  October  1977. 
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compatibility  as  perhaps  the  relations 

ANIMAL-TO-CLASS-COMPATIBILITY -ASSOCIATION 

CLASS-TO-SPECI E-COMPATIBILITY- ASSOCIATION . 

The  attributes  and  instances  of  these  relations,  renamed  for  convenience, 
are  shown  in  Figure  10.  Now,  the  query  "what  class  does  the  species  bat 
belong  to  and  with  what  compatibility?"  is  easily  expressed  in  RIPL  as 

GET  S .OF.  C LASS-SPEC TE/CLASS, COMP  .WHERE.  SPECIE  - 'BAT' 

which  would  produce  the  same  answer  as  that  produced  for  the  previously 
stated  query  to  the  FSTDS  system. 


Animal  - 


Class  - 


(Class , 

Comp) 

Bird 

middle 

Mammal 

high 

Fish 

low 

(Class , 

Specie , 

Comp) 

Bird 

Canary 

1 

Bird 

Bat 

0.5 

Mammal 

Bat 

high 

Mammal 

Whale 

0.8 

Fish 

Whale 

0.7 

Fish 

Salmon 

1 

Figure  10.  Possible  relational  view  of  fuzzy  knowledge. 

The  example  demonstrates  that,  while  fuzzy  knowledge  exists,  in  real- 
world  situations,  people  must  deal  with  the  lack  of  certainty,  or  proba- 
bilities, by  some  means — commonly  by  assigning  a value  factor  to  the  al- 
ternatives and  making  decisions  based  in  some  manner  on  these  values.  To 
repeat  the  process,  an  automated  system  must  be  provided  with  the  same 
values  and  rules  for  choosing  between  alternatives. 

In  this  regard,  the  concepts  of  fuzzy  logic  (R-50)  and  validity  val- 
ues (R-44)  converge  when  we  view  the  validity  values  as  the  semantics  of 
uncertainty.  Thus  for  example,  given  the  situation  in  which  the  location 
of  an  airfield  may  be  supplied  by  multiple  sources,  we  can  assign  a valid 
ity  value  to  each  whose  semantics  are  made  clear  to  users.  A value  sup- 
plied by  a satellite  may  be  considered  more  reliable  (i.e.,  have  a higher 
validity  value)  than  that  supplied  by  an  aircraft,  which  in  turn  is  more 
reliable  than  that  from  a ground  observation.  By  assigning  numeric  valid 
ity  values  to  each  type  of  source,  we  can  declare  the  semantics  in  the 
relations 

AIRFIELD  (NAME,  TYPE,  0RUNWAYS , . . . ) 


AIRFIELD-LOC  (NAME,  LOC,  VALIDITY) 


A query  to  find  the  most  probable  location  of  airfield  X can  be  stated  in 
RIPL  as 

GET  S .OF.  AIRFIELD/LOC, VALIDITY  .WHERE.  NAME-'X' 

GET  T .OF.  MAX/MAX-A  .WHERE.  A' -S /VALIDITY' 

PRINT  U .OF.  S/LOC  .WHERE.  VALIDITY-T/MAX-A 

In  RIPS,  the  concept  of  "most  probable  airfield  location"  can  be  specified 
in  general  through  the  use  of  derivations,  and  this  knowledge  can  be  made 
available  to  users.  Thus,  the  specification 

S(AIRFIELD)«AIRFIELD-LOC/NAME,LOC, VALIDITY  .WHERE.  NAME= AIRFIELD /NAME 

T (AIRFIELD) “MAX/MAX-A  .WHERE.  A ' “S (AIRFIELD) /VALIDITY ' 

MOST-PROB-LOC(AIRFIELD)«S (AIRFIELD) /LOC  .WHERE. 

VALIDITY“T (AIRFIELD) /MAX-A 

stored  in  the  DD/D  extends  the  information  structure,  producing  the  de- 
rived attribute 

"MOST-PROB-LOC" 

in  the  context  of  airfield  as 

AIRFIELD (NAME,  TYPE,  # RUNWAYS,  MOST-PROB-LOC,  ...) 

and  now  the  RIPL^  query  can  be  stated  by  users  as 

GET  S .OF.  AIRFIELD/MOST  PROB-LOC  .WHERE.  NAME= ' X' 
which  the  GEUF  will  reduce  10  a RIPLq  query. 

In  keeping  with  KM  philosophy,  the  knowledge  concept  is  visible  to  the 
knowledge  manager  because  the  derivation  is  accessible  from  the  DD/D. 
Additional  explanation  of  the  history  or  rationale  behind  the  concept  can 
''e  stored  in  the  DD/D  and  automatically  displayed  to  users  querying  the 
derived  attribute  through  the  RIPS  concepts  of  predefined  queries. 

The  preceding  is  not  intended  to  be  a final  statement  of  the  appli- 
cation of  fuzzy  logic,  probablistic  rules,  or  validity  values,  but  only 
to  describe  current  RIPS  concepts.  Further  investigation  of  the  subject 
will  lead  to  a unified  concept  for  handling  uncertainty  by  making  the 
semantics  visible  to  the  using  community. 

Context  Integrity 

We  have  been  investigating  the  use  of  Dana  Scott's  lattice  theory 
logic22* 23 > 24  for  representing  unknown  or  unavailable  and  inconsistent 

22.  Dana  Scott:  The  Lattice  of  Flow  Diagrams.  Technical  Memo  No.  PRG-3, 
Programming  Research  Group,  Oxford  University  Computing  Laboratory, 

45  Banbury  Rd,  Oxford,  England. 

23.  Dana  Scott:  "Logic  and  Programming  Languages,"  Communications  of  the 
ACM,  Vol  20,  No.  9,  pp  634-641,  September  1977. 

24.  D.  S.  Scott;  "Data  Types  as  Lattices,"  SIAM  Journal  on  Computing,  5, 


information  for  knowledge  management.  This  lattice,  or  four-valued  logic 
shown  in  Figure  11,  has  in  addition  to  the  usual  truth  values  the  symbols 
bottom  (.L)  and  top  (t) . Bottom  means  that  the  value  is  unknown  or  unavail- 
able (at  least  to  the  particular  user)  at  this  time.  Top  means  that  in- 
formation is  inconsistent.  We  view  the  symmetry  of  this  lattice  as  a use- 
ful model  for,  on  one  hand,  attempting  to  retrieve  information  that  is  not 
available,  and  on  the  other,  attempting  to  add  information  in  violation  of 
integrity  assertions. 


Figure  11.  Scott's  lattice 

Scott's  theory  introduces  an  information  theoretic  ordering  relation 
(a  C b) , which  means  that  a is  consistent  with  b as  far  as  it  goes,  but 
that  b may  have  more  information.  Thus,  A C false,  A C true,  false  C t, 
and  true  C t.  This  ordering  can  be  extended  to  all  types  of  data  as  well 
as  truth  values.  Figure  12  shows  the  lattice  applied  to  numbers.  In  this 
case,  bottom  represents  an  unknown  or  inaccessible  number,  and  top  repre- 
sents an  attempt  to  assign  two  different  numbers  to  a data  element  that 
may  have  only  a single  value. 


Figure  12.  Scott's  lattice  applied  to  numbers. 

Based  on  the  ordering,  the  theory  defines  limits  of  sequences  and  con- 
tinuity in  a manner  similar  to  mathematical  analysis  or  topology.  The  no- 
tation of  a monotonic  sequence  approaching  a limiting  value  is  used  to 
mean  successive  increments  of  consistent  information  approaching  the  maxi- 
mum truthful  information  available. 

The  objective  of  this  research  is  to  combine  the  concepts  of  integrity, 
authorization,  and  concurrency  resolution  into  a single  theory  and  imple- 
mentation. 


Dynamic  Network  Resource  Allocation 

The  issue  of  resource  requirements  (R-28)  is  addressed  in  RIPS  through 
the  database  system  simulator  to  the  extent  that,  for  a given  application 
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and  DBMS,  the  host  system,  traffic  demands,  and  distributions  required  by 
the  application  can  be  analyzed  for  each  data  user,  source,  channel,  and 
storage  device  defined.  These  parameters  are  used  in  determining  the  ad- 
equacy of  the  given  system  to  accommodate  the  application.  Thus,  while 
no  automated  network  resource  design  or  dynamic  resource  allocation  capa- 
bility is  being  pursued,  some  key  elements  in  any  such  decision-making 
model  are  included  in  the  RIPS  QDD  and  measurable  by  the  simulator. 

Dynamic  Restructuring 

The  requirement  for  dynamic  restructuring  (R-ll,  R-24)  implies  the 
need  for  data-use  descriptions  that  constitute  the  decision-making  cri- 
teria for  selecting  from  among  alternative  data  structures.  In  RIPS, 
the  QDD  serves  this  purpose  but  is  used  for  data  structure  evaluation 
as  opposed  to  data  structure  design.  However,  the  following  example 
demonstrates  the  potential  for  employing  the  QDD  and  other  RIPS  com- 
ponents in  a cybernetic  system. 

The  QDD  parameter,  associated  qualification  rate  (AQR) , describes  the 
rate  at  which  an  attribute  is  used  in  qualifying  tuples  of  its  associated 
relation.  Consider  the  relation 

EMP(E#, NAME, SECURITY-CLEARANCE,  . . .) 

and  the  environment  in  which  queries  about  employees  are  commonly  quali- 
fied by  particular  security  clearances.  If  the  rate  of  such  qualification 
is  high  enough,  the  DBA  could  choose  to  implement  an  index  of  security 
clearance  values  to  facilitate  quick  retrievals.  If,  over  a period  of 
time,  the  organization  becomes  less  involved  in  security  projects,  such 
qualifications  may  become  extremely  rare.  Now  the  index  may  be  consuming 
resources  disproportionate  to  its  utility.  If  the  QDD  parameters  were 
maintained  dynamically  by  sampling  user's  production  queries,  the  value 
of  AQR  for  security  clearance  would  eventually  fall  below  some  DBA-pre- 
scribed threshold,  indicating  that  there  is  no  longer  justification  for 
the  index.  In  RIPS,  it  would  be  relatively  simple  to  define  this  situ- 
ation as  a trigger  to  execute  partially  predefined  queries  that  would 
change  the  DIAM  descriptions  of  the  implementation  in  the  DD/D  and  to 
issue  DDL  statements  to  the  DBMS  to  eliminate  the  index. 

While  the  preceding  is  a rather  simple  example  of  dynamic  restruc- 
turing, extension  to  more  complex  considerations  appears  promising.  How- 
ever, serious  problems  can  result  from  treating  changing  profiles  as  per- 
manent conditions,  when  in  fact  they  are  anomolies.  Some  intervention 
may  always  be  required  to  determine  the  underlying  reason  for  measured 
changes  and  further  actions  initiated  on  the  basis  of  these  findings. 

Automated  Schema  Generation 

Many  of  the  considerations  discussed  in  the  preceding  paragraph  per- 
tain here.  Automated  generation  of  the  conceptual  schema  would  require 
some  statement  of  the  functional  dependencies  that  exist  in  the  real 


world.  Excellent  work  has  been  done  In  this  area  (e.g..  Reference  12)  In 
which  the  result  of  a collection  of  functional  dependencies  is  a set  of 
third-normal-form  relations.  However,  because  the  solution  is  not  unique, 
the  utility  of  these  techniques  in  a production  environment  is  question- 
able. We  are  investigating  a fourth  normalization  that  will  include  fur- 
ther real-world  constraints  and  lead  to  a unique  solution. 

Except  for  default  display  formats,  external  schemata  generation  can 
be  automated  only  to  the  extent  that  the  interfacing  techniques  provided 
for  the  user  can  be  made  easy  to  use.  This  is  because  display  formats 
are  generally  dictated  by  external  considerations  (e.g.,  government  forms, 
industry  standards,  etc).  Providing  a suitable  user  interface  is  already 
envisioned  in  RIPS,  but  the  details  are  properly  the  subject  of  human  fac- 
tors. A good  example  of  such  an  interface  is  Query  by  Example  and  the 
System  for  Business  Automation, 14  which  can  be  user  defined  in  RIPS  using 
current  concepts. 

Internal  schemata  generation  can  be  automated  by  modeling  a designer's 
decision-making  process  within  the  alternatives  offered  by  his  DBMS.  The 
example  in  the  preceding  paragraph  illustrates  this  concept  using  the  QDD 
parameters  of  the  application  in  the  decision-making  process. 

Concurrency  Resolution 

In  RIPS,  the  problem  of  concurrency  resolution  is  considered  to  be 
partially  resolvable  at  the  information  level.  In  general,  however,  if 
multiple  sources  can  update  the  same  data,  so  that  one  update  may  super- 
sede another,  there  is  some  question  of  organizational  consistency.  The 
example  often  presented  involves  a case  in  which  one  source  wants  to  give 
a specific  employee  a 10%  increase  in  salary  and  another  source  wants  to 
increase  the  salary  of  all  employees  by  5%.  The  order  in  which  these 
queries  are  processed  will  affect  the  final  salary  of  the  specific  em- 
ployee. However,  the  problem  is  not  one  of  data  processing  but  one  of 
organizational  policy.  If  such  an  eventuality  can  occur  in  some  context, 
then  rules  for  which  query  is  to  be  processed  first  must  be  provided. 

In  many  cases,  the  analysis  required  to  derive  such  concurrency  rules 
will  result  in  recognition  that  the  semantics  of  the  attribute  subject  to 
update  is  such  that  there  are  in  fact  multiple  attributes  in  question. 

For  example,  if  the  location  of  an  airfield  can  be  provided  by  two  sources, 
retention  of  both  may  be  useful  and  must  therefore  be  recognized  in  the 
information  structure.  Thus,  rather  than  the  relation 

AIRFIELD (NAME, LOC,  ...) 
we  have 

AIRFIELD (NAME, L0C-PER-S0URCE-1 .LOC-PER-SOURCE-2 , . . .) 

12.  Morton  M.  Astrahan  and  Donald  D.  Chamberlain:  Implementation  of  a 
Structured  English  Query  Language.  RJ1464,  IBM  Research  Center,  San 
Jose,  California,  October  28,  1974. 

14.  M.  M.  Zloof : "Query  by  Example,"  Proc.  National  Computer  Conference, 
AFIPS  Press,  Vol  44,  1975,  pp  431-438. 
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and  the  order  in  which  two  simultaneous  updates  are  processed  is  immate- 
rial with  respect  to  the  final  values.  The  example  can  easily  be  expanded 
to  include  recognition  of  probablistic  or  accuracy  values  owing  to  the 
source . 

In  general,  the  case  of  retrievals  for  data  currently  being  updated 
can  be  handled  at  the  information  level  by  examining  queries  in  process 
before  submitting  the  retrieval.  If  the  same  relations  are  involved  in 
the  update  as  in  the  retrieval,  the  retrieval  must  be  held  until  comple- 
tion of  the  update.  In  a distributed  environment,  the  knowledge  that  a 
pertinent  relation  is  being  updated  by  a remote  site  may  not  be  available 
to  the  site  where  the  retrieval  query  originated.  A satisfactory  solution 
to  this  problems  remains  to  be  found. 

In  an  environment  in  which  time  dependencies  may  be  expressed  in  the 
query  language  (see  Time  Dependencies) , the  concurrency  resolution  prob- 
lem is  expanded  for  the  previous  example.  If  a retrieval  query  is  re- 
ceived specifying  time  Tj , and  an  update  has  been  processed  at  time  T2, 
the  update  transaction  must  be  backed  out  in  order  to  answer  the  query. 

In  general,  however,  the  concurrency  resolution  routine  can  determine 
whether  any  conflicting  updates  have  been  processed  if  it  has  access  to 
the  transaction  log.  If  no  updates  have  occurred  between  the  time  speci- 
fied in  the  retrieval  query  and  the  current  time,  the  query  can  be  pro- 
cessed immediately. 

Improved  Security  Techniques 

Concepts  of  ensuring  security  or  authorization  at  the  information 
level  have  been  described.  Whether  or  not  additional  security  techniques 
(e.g.,  hardware-provided  encryption,  etc)  will  affect  the  concepts  remains 
to  be  determined. 

Data  Presentation 

RIPS  envisions  commonly  used  but  diverse  data  presentation  techniques 
including  graphics,  but  has  not  addressed  such  concepts  as  imposing  digit- 
ally generated  data  over  externally  provided  formats,  holographic  imagery, 
etc.  RIPS'  primary  concern  is  to  provide  the  internal  representation  to  a 
device  driver  that  will  result  in  the  exter-.allv  visual  representation 
sought.  The  major  apparent  problem  is  that  users  think  in  terms  of  visual 
representation  and  want  to  describe  displays  in  similar  terms.  The  CEUF 
is  intended  to  provide  this  interface  to  facilitate  declaration  of  formats. 
The  corresponding  internal  representations  appear  to  be  describable  in 
DIAM  terms,  at  least  for  commonly  used  devices  and  displays,  as  described 
later,  but  the  choice  of  display  formats  for  particular  applications  and 
users  is  the  subject  of  human  factors  and  human  engineering,  limited  only 
by  available  devices. 

RIPS  has  proposed  the  extension  of  DIAM  descriptions  to  include  speci- 
fication of  two-dimensional  displacements,  simply  because  displays  are 
viewed  in  two  dimensions.  It  remains  to  be  determined  whether  this  ex- 
tension will  suffice  for  the  types  of  data  presentation  envisioned  by  KM. 
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Text  Processing 


Conceptually,  a text  processing  system  is  a database  application  in 
which  relations  that  comprise  the  information  structure  or  conceptual 
view  are  documents,  pages,  paragraphs,  sentences,  words,  etc.  Database 
functions  are  required  to  retrieve  and  maintain  text  data  in  the  same  way 
as  for  any  other  entity  representations,  but  of  course  some  storage  struc 
tures  are  much  more  efficient  for  text  processing  than  other  types  of  dat 
processing. 

In  general,  characteristics  that  cloud  these  similarities  are  the 
user's  language,  limited  information  structure,  uniform  storage  tech- 
niques, and  document-oriented  display  formats.  Also,  text  processing 
systems  make  extensive  use  of  temporary  (working)  storage  where  documents 
and  pages  are  modified  in  a fast-access  temporary  storage,  then  placed  in 
a permanent  file  on  completion. 

Conceptually,  RIPS  accommodates  all  these  characteristics , including 
the  declaration  of  temporary  relations  that  could  well  be  documents  or 
pages,  and  thus  envisions  text  processing  as  a natural  part  of  the  MIS. 
However,  the  precise  relations  that  constitute  a sufficient  information 
structure  for  generalized  text  processing  and  application  of  the  RIPS 
concepts  in  defining  a suitable  user's  language,  display  formats,  and 
storage  structures  have  not  been  analyzed  in  this  specialized  application 


RIPS  EXTENSIONS  TO  KM 

Tiie  major  cot  ptual  extension  of  KM  that  RIPS  provides  lies  in  the 
degree  and  methodology  of  providing  visibility  to  knowledge  concepts.  In 
the  RIPS  view,  management  of  knowledge  concepts  is  precisely  the  manage- 
ment of  application  functions,  and  for  effective  control,  the  functions 
must  be  visible — not  bound  indisti nguishably  in  application  programs. 

This  major  concept  has  led  RIPS  to  eliminate,  as  much  as  possible,  the 
common  practice  of  application  program  development,  thus  allowing  users 
to  specify  'what'  information  is  required,  not  ’how'  it  is  to  be  obtained 
The  potential  of  this  concept  is  a solution  to  many  problems  of  applica- 
tions programming  practice,  including  modularization,  structured  tech- 
niques, language  usage,  programming  standards,  etc. 

KM  concepts  require  a comprehensive  set  of  capabilities — far  more 
than  any  existing  system  provides.  However,  even  if  all  the  require- 
ments were  to  be  provided  by  a production  system,  implementation  methods 
could  be  so  diverse  that  the  total  system  would  be  unmanageable.  For 
example,  if  schemata  descriptions  for  a production  system  differ  from 
those  for  a simulation  system,  interfacing  problems  could  result  in  such 
a lengthy  definition  phase  that  timely  results  would  be  impossible.  It 
is  not  only  a requirement  that  KM  capabilities  be  provided,  but  it  is 
equally  a requirement  that  techniques  employed  be  consistent  to  be 
manageable . 


RIPS  provides  this  compatibility  throughout.  The  same  QDDs  are  used 
for  query  optimization  as  are  used  for  simulation;  schema  descriptions  use 
the  same  formalism  for  internal  and  external  schemata  and  for  internal 
schemata  of  both  the  production  system  and  the  simulator;  the  DD/D  declar- 
ations and  retrievals  use  the  same  language  as  productions  queries;  etc. 
Thus,  RIPS  concepts  extend  the  requirements  to  include  those  of  consistency 
in  implementation. 

The  RIPS  concept  of  providing  independence  of  queries  from  their  im- 
plementation extends  to  external  or  display  formats.  As  previously  stated, 
this  independence  is  necessary  to  relaize  the  ANS I-X3-SPARC  architecture. 
Thus,  while  KM  requires  orderly  migration  of  internal  and  external  func- 
tions, RIPS  makes  such  requirements  more  explicit  by  defining  the  elements 
for  realizing  them. 

The  KM  concept  envisions  a high  degree  of  knowledge  of  the  requirements 
for  nodes  in  the  network,  especially  in  processing  optimization  and  per- 
formance analysis.  In  RIPS,  these  requirements  are  made  explicit  through 
quantitative  data  decriptions  that  are  employed  as  a natural  element  of 
system  architecture.  The  effect  of  this  concept  is  a unified  view  of  pro- 
cessing requirements  and  information  management — both  use  the  same 
formalisms . 


FUNCTIONAL  ALLOCATION  OF  KM  REQUIREMENTS  TO  RIPS 

This  section  summarizes  the  functional  capabilities  of  RIPS  and  shows 
the  correlation  to  KM  functions.  Table  1 contains  the  allocation  assign- 
ments. Each  column  of  the  matrix  represents  a functional  component  of 
RIPS  and  is  described  below.  The  three  columns  at  the  right  of  the  table 
indicate  that  the  conceptual  foundation  required  to  accommodate  corres- 
ponding KM  functions  is  incomplete,  undetermined,  or  not  considered  part 
of  RIPS.  Where  entries  are  made  in  one  of  these  columns  and  in  one  or 
m)re  of  the  other  columns,  the  corresponding  KM  requirement  is  only  par- 
tially satisfied  by  the  current  RIPS  concept  as  further  described  below. 

Unless  otherwise  noted,  KM  requirements  reference  those  listed  under 
Synopsis  of  KM  Requirements  (page  28ff). 

Generalization  of  Requirements 

Generalization  of  requirements  recognizes  the  fact  that  requirements 
are  not  only  necessary  for  an  initial  system  implementation,  but  because 
they  are  evolutionary,  they  are  also  the  basis  for  changes  to  implemen- 
tations. The  original  statement  of  requirements  should  be  maintained  to 
allow  visibility  of  gradual  changes  that  foretell  the  need  for  corres- 
ponding implementation  changes  to  accommodate  them. 

To  be  sufficient,  virtually  every  change  to  an  implementation  and 
every  choice  made  between  alternative  techniques  should  be  justifiable 
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explanation  of  numbered  notes  on  pages  59  and  60. 


in  terms  of  the  requirements.  Thus,  they  are  necessary  for  all  performance 
analyses  and  all  management  functions  dealing  with  organizational  data 
flows  and  real-world  dynamics. 

Generalization  of  requirements  in  RIPS  refers  to  the  information  struc- 
ture and  QDD  stored  in  the  DD/D,  maintained  through  the  use  of  predefined 
queries,  and  accessible  through  the  RIPL  and  GEUF  capabilities. 

DMS  Software  Evaluation  Methodology 

DMS  software  evaluation  methodology  is  a set  of  techniques  that  pro- 
vides a means  of  choosing  between  alternative  software  packages  for  a 
known  application  in  a given  environment  that  are  both  consistent  and 
repeatable  (i.e.,  two  analysts  must  arrive  at  the  same  conclusion).  The 
evaluation  includes  both  qualitative  and  quantitative  characteristics. 

Software  evaluation  methodology  in  RIPS  refers  to  the  use  of  QDD, 
math-model  and  real-time  simulators,  and  procedures  for  evaluating  quali- 
tative characteristics  in  terms  of  the  time  and  cost  required  to  provide 
the  same  qualitative  characteristics  for  each  alternative. 

Performance  Evaluation 

Performance  evaluation  is  a means  of  predicting  the  performance  of  a 
DBMS  implementation  for  a known  application  before  implementation.  The 
results  of  such  simulations  are  the  basis  for  choosing  between  alternative 
implementations . 

Performance  simulation  in  RIPS  refers  to  the  use  of  the  QDD  and  the 
math-model  and  real-time  simulators. 

Generalization  of  Processing 

Generalization  of  processing  is  a means  of  relieving  users  of  the  need 
to  specify  how  and  where  data  in  the  network  are  accessed  or  derived, 
allowing  specification  of  only  what  data  are  required.  In  RIPS,  this 
capability  is  provided  by: 

1)  Viewing  algorithms  as  relations  of  the  information  structure  in 
the  same  way  that  stored  data  are  described; 

2)  RIPL  language; 

3)  Processing  both  algorithms  and  data  by  procedures  automatically 
generated  by  the  QC/T  regardless  of  their  implementation; 

4)  Automatic  reduction  of  RIPL^  queries  to  RIPLq  by  the  GEUF. 

Generalization  of  Context 

Generalization  of  context  provides  the  means  of  predefining  the  parts 
of  queries  that  are  known  a priori,  allowing  users  to  supply  only  the 
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particulars  of  current  interest  at  execution  time.  This  precludes  users 
from  having  to  restate  entire  queries  each  time  they  are  required.  The 
capabilities  extend  to  the  extremes  where,  on  the  one  hand,  no  parts  are 
known  a priori  (i.e.,  ad  hoc  queries),  and  on  the  other,  all  parts  are 
known  including  when  they  are  to  be  executed  (i.e.,  real-time  queries). 
Allowing  user-defined  specifications  of  when  queries  are  to  be  executed, 
by  specification  of  trigger  conditions  as  the  stimulus,  provides  alert 
processing. 

In  RIPS,  these  capabilities  are  provided  by: 

1)  Allowing  partially  predefined  queries  to  be  defined  in  RIPL  and 
stored  in  the  DD/D  as  relations  that  are  part  of  the  database; 

2)  Mapping  user-supplied  data  to  particularize  a query  by  updating 
the  stored  relations  (in  1)  through  procedures  automatically 
generated  by  the  QC/T  as  directed  by  the  GEUF; 

3)  Allowing  stimuli  to  be  user-defined  along  with  specification  of 
what  predefined  queries  are  to  be  executed  upon  receipt  of  a 
stimulus ; 

4)  Monitoring  of  stimuli  by  the  GEUF  and  automatic  execution  of  the 
corresponding  query  by  the  QC/T; 

5)  Use  of  user-supplied  user  views  to  establish  a tailored  view  of 
the  database  in  the  vernacular  of  a specific  diacipline; 

6)  Specification  of  derivations  to  make  knowledge  concepts  visible 
by  declaring  their  semantics,  thereby  extending  the  information 
structure  and  making  such  knowledge  available  to  users  in  a con- 
sistent, uniform  manner; 

7)  Use  of  RIPL  to  reference  user  views  and  derivations,  and  its 

n 

automatic  reduction  to  RIPL  by  the  GEUF. 

o 


Generalization  of  Formats 


Generalization  of  formats  is  a means  of  relieving  users  of  speci- 
fying how  display  formats  are  to  be  generated,  allowing  specification  only 
of  what  formats  are  required.  The  capability  provides  device  independence 
and,  in  conjuction  with  generalization  of  context,  allows  complete  flexi- 
bility in  designing  user  interfaces. 

In  RIPS,  this  capability  is  provided  by: 

1)  Specification  of  formats  in  DIAM  terms,  via  RIPL,  stored  in  the 
DD/D; 

2)  Use  of  partially  predefined  queries; 

3)  Processing  user-supplied  data  and  display  formats  through  pro- 
cedures automatically  generated  by  the  QC/T  as  directed  by  the 
GEUF. 


Generalization  of  Integrity 

Generalization  of  integrity  is  a means  of  relieving  users  of  having 
to  ensure  that  all  data  stored  and  displayed  are  current  each  time  a query 
is  executed.  The  degree  to  which  integrity  can  be  thus  controlled  is  user- 
defined  by  assertions  expressed  in  RIPL  for  each  relation/attribute  in  the 
information  structure. 

In  RIPL,  these  capabilities  are  provided  by: 

1)  Specification  of  integrity  assertions  expressed  in  RIPL  and  stored 
In  the  DD/D,  including  assertions  over  the  relations  of  the  DD/D 
itself ; 

2)  Query  modification  automatically  appending  integrity  assertions  to 
all  user's  queries  by  the  GEUF; 

3)  Evaluation  of  all  query  predicates  by  the  QC/T  before  execution, 
thereby  processing  only  valid  queries. 

Generalization  of  Authorization 

Generalization  of  authorization  provides  a means  of  controlling  access 
to  stored  and  derived  data  by  specifying  what  data  are  prohibited  to  spe- 
cific users  or  sources,  in  terms  of  the  information  structure. 

In  RIPL,  these  capabilities  are  provided  by: 

1)  Specification  of  what  authorization  constraints  are  to  be  imposed 
via  RIPL  and  stored  in  the  DD/D,  including  relations  in  the  DD/D 
itself ; 

2)  Query  modification  by  the  GEUF,  automatically  appending  authori- 
zation constraints  to  all  user's  queries; 

3)  Evaluation  of  all  query  predicates  by  the  QC/T  before  execution, 
thereby  processing  only  authorized  queries. 

Generalization  of  Data  Accessing 

Generalization  of  data  accessing  recognizes  that  there  is  a consistent 
model  for  the  description  of  alternative  implementations  for  a given  in- 
formation structure  with  respect  to  stored  data,  and  that  exploiting  such 
descriptions  allows  mapping  of  representation-independent  queries  into 
representation-dependent  queries.  Coupled  with  the  syntactic  translator, 
representation-dependent  queries  can  be  expressed  in  the  language  of  di- 
verse data  management  systems. 

In  RIPS,  these  capabilities  are  provided  by: 

1)  Allowing  descriptions  of  distributed  implementations  in  terms  of 
the  DIAM  model  and  the  information  structures,  and  storing  them  in 
the  DD/D; 

2)  Reducing  RIPLq  queries  to  representation-dependent  language  (RDL) 
by  the  QC/T; 
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3)  Translating  RDL  into  a DMS  language  by  the  QC/T; 

4)  Translating  data  returned  by  DMSs  into  canonic  representation  of 
the  information  structure  defined  by  RIPL  queries  by  the  QC/T. 

Generalization  of  Specifications 

Generalization  of  specifications  recognizes  the  fact  that,  for  data 
processing  functions  to  be  manageable,  specification  of  such  functions 
must  be  available  to  managers  in  a consistent  manner. 

In  RIPS,  this  capability  is  provided  by: 

1)  Viewing  all  functional  specifications  as  relations  stored  in  the 
DD/D; 

2)  Including  these  relations  in  the  information  structure,  thus  mak- 
ing them  available  just  as  any  other  data  in  the  database; 

3)  Allowing  specifications  of  integrity,  authorization,  user  views, 
etc  over  the  relations  that  contain  the  specifications; 

4)  Allowing  all  capabilities  of  RIPS  to  be  applied  to  specifications. 

Proposed  Concepts  Not  Analayzed 

Entries  in  this  column  of  Table  1 correspond  to  the  following  notes: 

1)  Automatic  update  of  QDD  parameters  is  only  partially  provided. 

Populations  and  population  distributions  of  entity  representations 
can  be  periodically  determined  by  issuing  applicable  count  queries 
to  each  node  in  the  network.  However,  the  dynamics  can  only  be 
maintained  current  for  remote  nodes  in  cases  where  a RIPS  package 
is  installed.  For  nodes  without  a RIPS  package,  specialized  ap- 
plications for  this  purpose  must  be  implemented. 

2)  Generalization  of  integrity  primarily  addresses  the  case  of  up- 
dates. The  use  of  four-valued  logic  in  query  processing  addresses 
retrievals  including  derivations  (refer  to  Context  Integrity). 

3)  While  user-defined  probabilities  are  considered  in  generalization 
of  context,  fuzzy  logic  and  tnferencing  have  not  been  fully  ana- 
lyzed (refer  to  Application  of  Probablistic  Rules). 

4)  Time  dependencies  have  not  been  analyzed,  but  the  inclusion  of 
time  domains  in  relations  appears  promising  (refer  to  Time  De- 
pendencies) . 

5)  Text  processing  techniques  have  been  proposed  but  not  fully  ana- 
lyzed (refer  to  Text  Processing). 

No  Formal  Concept  Available 

Entries  in  this  column  correspond  to  the  following  notes: 

1)  The  design  of  experiments  is  now  largely  intuitive.  Formal  tech- 
niques are  needed  but  require  further  research.  No  proposed 
solution  is  available. 
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2)  Data  restructuring  is  provided  when  both  the  source  schema  and 
target  schema  are  predetermined,  provided  through  the  use  of  pre- 
defined queries  whose  output  format  is  the  target  schema.  Auto- 
matic generation  of  an  internal  target  schema  that  is  optimized 
in  some  sense  appears  feasible  when  the  target  schema  is  selected 
from  some  small  set  of  alternatives,  and  the  rules  for  choosing 
between  them  are  known.  Automatic  generation  of  a conceptual 
schema  has  been  proposed12  but  not  analyzed  for  RIPS.  No  formal 
concepts  are  available  for  automatic  generation  of  external 
schemata,  although  default  schemata  are  provided. 

3)  Concepts  for  dynamic  resource  allocation  have  not  been  proposed. 

4)  The  subject  of  parallel  processing  is  accommodated  in  the  sense 
that  subqueries  to  different  nodes  have  the  potential  for  simul- 
taneous processing  accommodated  by  the  QC/T.  Parallel  processors 
at  a single  node  have  not  been  addressed. 

5)  Algorithm  validation  has  not  yet  been  addressed.  Extensive  work 
has  been  accomplished  in  this  area,  including  formal  proofs,  but 
no  formal  concept  has  been  selected  for  application  to  RIPS. 

Requirements  Outside  RIPS 

Requirements  outside  RIPS  include: 

1)  All  external  representations  include  considerations  that  are  the 
subject  of  human  factors  and  engineering  that  are  outside  the 
scope  of  RIPS. 

2)  The  QC/T  is  not  intended  to  be  a DBMS  as  such.  We  have  recognized 

3)  that  a DBMS  built  on  the  DIAM  theory  would  be  useful  and  could  be 

4)  employed  recursively  for  both  external  and  internal  data  manage- 
ment. In  such  an  environment,  techniques  for  (2)  deadlock  reso- 
lution and  locking  schemes,  (3)  database  recovery,  and  (4)  resto- 
ration and  restructuring  would  have  to  be  provided  for  any  current 
implementation.  However,  in  the  environment  intended,  where  the 
QC/T  interfaces  with  existing  DBMs , these  techniques  are  consid- 
ered the  responsibility  of  that  DBMS  and  therefore  outside  the 
scope  of  RIPS.  Exceptions  are  discussed  under  Factual  Knowledge 
Subsystem  (FKS) . 


12.  Morton  M.  Astrahan  and  Donald  D.  Chamberlain:  Implementation  of  a 
Structured  English  Query  Language.  J1464  IBM  Research  Center,  San 
Jose,  California,  October  28,  1974. 
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CURRENT  STATUS  OF  RIPS 
Information  Structure 

A baseline  information  structure  is  proposed  as  described  under  Infor- 
mation Structure  (page  11).  Four  activities  may  affect  final  specifica- 
tions : 

1)  A complete  analysis  of  the  description  of  algorithms  in  relational 
terms.  The  major  problem  to  be  considered  is  representation  of 
algorithms  that  appear  to  require  ordering  (e.g.,  matrix  operations) 
or  ranking  (e.g.,  statistical  analyses).  Concepts  have  been  pro- 
posed. 

2)  Time  dependencies; 

3)  Fourth  normalization; 

4)  Verification  of  RIPL  . 

o 

Representation-Independent  Programming  Language 

Existing  concepts  are  undergoing  proof  and  completion  demonstrations. 
Paper  is  in  progress.  Work  remaining  includes: 

1)  Final  information  structure  (includes  time  dependencies); 

2)  Proof  and  completion  analyses; 

3)  Query  balancing  analysis; 

4)  Final  syntax. 

Data  Dictionary /Directory 

The  minimum  set  of  relations  that  will  be  required  cannot  be  deter- 
mined until  all  other  specification  descriptions  are.  Determination  of 
which  attributes  are  to  be  under  system  (preprogrammed)  authorization 
control  remain  to  be  determined.  Basic  concepts  of  access  and  mainte- 
nance are  proposed  but  require  completion  of  the  C.EUF  and  QC/T  for  vali- 
dation. Although  access  to  the  DD/D  is  generalized,  some  sample  set  of 
user-defined  DDL  (see  Data  Description  Language)  must  be  provided  for  RIPS 
prototype  demonstration  and  will  be  accomplished  during  test  application 
description  and  encoding. 

QDD  needs  to  be  re-evaluated  in  the  KM  environment  and  be  cast  in  re- 
lational terminology.  Predefined  query  specifications  require  completion 
of  RIPL.  Stimulus  specifications  require  further  analysis,  including  up- 
date of  QDD.  Some  stimulus  specifications  have  been  proposed.  Display 
formats  described  in  DIAM  terms  appear  to  require  an  extension  to  DIAM  to 
describe  two-dimensional  displacements.  This  analysis  has  not  begun,  but 
will  include  analysis  of  text  processing  requirements.  Specifications  of 
orderings,  symbols,  and  graphics  have  not  begun. 

Integrity  assertions  and  authorization  constraints  have  received  con- 
siderable attention  by  other  workers,  which  appears  sound.  However,  we 
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have  not  yet  attempted  to  incorporate  them  in  RIPS.  Both  require  comple- 
tion of  RIPL  . 

o 

Basic  concepts  of  derivations  have  been  investigated,  and  their  re- 
quirements are  well  understood.  Final  descriptions  require  completion 
of  RIPLq  and  are  required  for  completion  of  RIPL^.  The  same  pertains  to 

user  views. 

DIAM  descriptions  of  internal  implementations  appear  to  be  complete 
with  some  recent  extensions,  but  these  remain  to  be  validated.  This  work 
is  in  progress. 

Generalized  End-User  Facility 

Completion  of  the  GEUF  requires  completion  of  all  application  func- 
tion specifications.  Major  components  to  be  developed  are: 

1)  Query  modification  to  append  integrity  and  authorization  con- 
straints to  user  queries; 

2)  Query  preprocessor  to  reduce  RIPL  queries,  which  are  in  terms 

of  user  views  and  derivations,  to  RIPL  queries,  which  are  in 

o 

terms  of  the  basic  information  structure; 

3)  Stimulus  monitor  to  direct  the  QC/T  to  execute  queries  corres- 
ponding to  both  user-supplied  and  internally  generated  stimuli. 

No  programming  has  begun  on  the  GEUF. 

Query  Compiler/Translator 

Preliminary  query  decomposition  is  complete  using  currently  defined 
RIPL^.  Final  validation  has  not  begun.  The  search-path  enumeration  al- 
gorithm is  complete  for  a significant  subset  of  access  path  descriptions 
and  is  being  validated.  Restrictions  that  specify  a specific  value  and 
restrictions  based  on  value  ranges  (e.g.,  employees  whose  salary  is 
greater  than  X)  have  not  been  implemented.  Considerable  work  has  been 
done  on  search-path  selection  criteria  based  on  cardinality  of  search 
paths  calculated  from  QDD  population  descriptions  and  implemented  string 
path.  Some  calculations  (e.g.,  relations  having  multiple  attributes  as 
their  identifier)  have  not  been  solved.  Only  uniform  distibutions  are 
implemented,  although  the  description  of  complex  distributions  (i.e., 
normal,  Zipfian,  empirical)  are  implemented  in  QDD.  No  heuristics  have 
been  implemented.  A baseline  RDAL  has  been  proposed  and  is  partially 
implemented.  No  effort  has  begun  on  syntactic  translation. 

All  programming  has  been  accomplished  using  the  math  model  simulator 
as  a test  bed  for  validation  of  concepts.  Consequently,  no  data  are  re- 
turned for  current  test  queries;  and  compilation  of  data  for  display  is 
conceptual. 
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Math-Model  and  Real-Time  Simulators 


As  extensions  to  basic  concepts  are  developed,  they  are  being  incor- 
porated in  the  math-model  simulator  to  provide  a test  bed  for  evaluating 
QC/T  concepts  and  other  descriptions.  By  continuing  this  process,  the 
math-model  simulator  is  being  maintained  to  current  concepts  along  with 
their  development.  The  result  is  that,  at  completion  of  the  QC/T  proto- 
type development,  the  math-model  simulator  will  be  relatively  current. 
However,  some  reprogramming  will  be  required  for  operating  efficiency 
because  the  modifications  will  be  ad  hoc.  These  improvements  will  also 
have  to  be  incorporated  in  the  real-time  simulator.  For  the  most  part, 
algorithms  developed  for  the  QC/T  will  be  used  intact  for  the  simulators 

DMS  Software  Evaluation  Methodology 

Software  selection  methodology  is  complete  but  not  documented.  Be- 
cause the  methodology  includes  use  of  the  simulators,  they  must  also  be 
complete. 


RIPS  DEVELOPMENT  WORK  PLAN 

This  section  discusses  a RIPS  development  plan  and  facility  require- 
ments to  support  the  development.  In  addition,  some  estimates  of  perfor- 
mance in  an  operational  environment  are  provided,  projected  from  empirical 
results  of  current  research  software. 

Development  Plan 

Figure  13  is  a work  plan  for  completing  the  conceptual  design,  devel- 
oping prototype  software,  and  providing  applicable  documentation  for  a KM 
test  bed  using  RIPS.  The  plan  includes  a test  and  checkout  phase  designed 
to  validate  RIPS  software  in  an  existing  distributed  information  system  on 
a noninterference  basis.  That  is,  the  test  phase  is  to  demonstrate  the 
technical  aspect  of  the  system — not  the  KM  concepts  with  respect  to  manage- 
ment issues. 

Figure  13  schedules  the  tasks  to  be  performed,  as  discussed  in  the  pre- 
vious section,  and  contains  rough  order-of-magnitude  estimates  of  manpower 
and  costs  of  computer  use  for  each.  Computer  cost  estimates  are  projected 
from  empirical  results  of  current  developmental  work  of  a similar  nature 
using  Martin  Marietta  computing  facilities,  as  discussed  in  the  next  sec- 
tion. 

A total  of  58  man  years  is  estimated  over  a 4-year  development  period. 
Total  computer  support  is  120  IBM  computer  units  at  an  approximate  cost  of 
$200/unit  or  a total  of  approximately  $24,000. 

The  proposed  schedule,  in  quarter  years,  is  phased  to  incorporate  task 
results  in  succeeding  tasks  as  necessary  and  to  provide  a relatively  con- 
stant staffing  level. 
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Time-Dependencies  Analysis 
Algorithm  Descriptions 
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The  following  are  specifically  excluded  for  the  reasons  indicated: 

1)  Dynamic  data  restructuring.  No  formal  theory  for  the  general 
case  is  known,  and  the  advisability  of  total  automation  Is 
questionable . 

2)  Dynamic  resource  allocation.  Same  as  1 above. 

3)  Simulation  experiment  design  methodology.  As  pointed  out  earlier, 
experiment  design  is  largely  intuitive.  No  formal  theory  is 
known  for  the  environment  envisioned,  although  statistical  methods 
of  adequate  sample  size,  sample  selection,  etc  are  well  developed 
for  other  disciplines  and  should  be  applicable. 

4)  Automated  schema  design  for  internal  and  external  schemata.  For 
conceptual  schema  design,  the  formalisms  will  be  developed,  but 
their  automation  is  excluded.  No  formal  theory  is  known  for  in- 
ternal or  external  schemata. 

Facility  Requirements 

Existing  software  (math-model  simulator  and  developmental  quevv  com- 
piler) is  programmed  in  ANSI  FORTRAN  and  is  operating  on  Martin  Marietta's 
IBM370-168  VS  with  TSO.  Earlier  versions  of  the  MMS  were  run  on  CDC  6500 
and  Uni vac  1108. 

The  executable  portion  of  the  program  requires  approximately  150K 
bytes  (IBM).  However,  because  the  data  of  the  application  under  studv 
(i.e..  Information  Structure,  QDD,  DTAM  descriptions,  etc)  are  Implemented 
as  program  data  arrays,  the  current  version,  configured  for  a large  ap- 
plication description,  requires  approximately  4M  bytes.  While  tills  size 
presents  no  problem  in  the  IBM  Virtual  System,  the  current  version  would 
require  reprogramming  (primarily  for  developing  a DMS  to  manage  the  data 
arravs  in  disk  storage)  to  run  on  (’DC  or  Univac  due  to  core  size  limita- 
tions . 

Development  to  date  has  been  implemented  in  a research  environment 
directed  primarily  toward  verification  of  concepts.  Consequently,  no 
attempts  have  been  made  to  optimize,  the  program,  either  for  performance 
or  s 1 ze  . 


Estimated  Performance  Characteristics 

The  following  predicted  performance  of  the  QC/T  process  is  projected 
from  existing  software  execution.  A sample  application  has  been  designed 
to  evaluate  and  demonstrate  the  concepts  as  they  are  being  developed.  The 
information  structure  is  shown  below  over  which  typical  Database  Task 
Croup  (DBTC)  access  models  for  potentially  distributed  portions  of  the  IS 
have  been  described  in  DIAM  descriptions.  Note  that  the  DISTANCE  relation 
could  represent  either  stored  data  or  an  algorithm. 

PLANES (ID, TYPE, MAX-SPEED, RANRE,UTM-CRID-NO,TIME-RPTD, HEADING) 

ATLAS (GE0-P01  .ITT CAL-NAMF. , UTM-C.R  1 D-NOS ) 
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SENSORS (TYPE, WEIGHT , FREQUENCY , PULSEWIDTH) 

PLATFORMS (ID-NUMBER , PLANE-ID , SENSOR-TYPE , DATE-INSTALLED) 


DISTANCE (UTM- 1 ,UTM- 2 .KILOMETERS) 

Nine  typical  queries  are  currently  described,  the  most  complex  being  as 
follows : 

"Print  the  ID,  max-speed  and  heading  of  all  planes  that  have  sensors 
with  frequency  of  12  and  sensitivity  of  either  .1  or  .2,  or,  frequency 
of  12  and  pulsewidth  of  3,  or,  pulsewidth  of  3 and  sensitivity  of 
either  .1  or  .2,  that  are  located  over  Crete." 

In  RIPL,  this  query  would  appear  as: 

GET  SI  .OF.  PLANES/ID,  MAX-SPEED .HEADING  . WHERE. 

I D=  S 2 / PLANE- 1 D .AND.  UTM-C,RID-NO=  S3/UTM-GRID-N0S 

GET  S2  .OF.  PLATFORMS/PLANE-ID  .WHERE.  SENS0R-TYPE=S4 /TYPE 

GET  S3  .OF.  ATLAS /UTM-GRID-NOS  .WHERE.  GEOPOLITICAL-NAME= ' CRETE ' 

GET  S4  .OF.  SENSORS/TYPE  .WHERE.  FRQUENCY= ' 12 ' 

.AND.  SENSITIVITY” ( ' . 1 ' , ' . 2 ' ) .OR. 

FREQUENCY” ' 12"  .AND.  PULSEWIDTH” ' 3*  .OR. 

PULSEWIDTH” '3'  .AND.  SENSITIVITY” (’. 3 ','. 2 ' ) 

Because  the  MMS  simulator  is  being  used  for  analysis,  the  queries  are 
stated  in  RIAL,  the  simulator's  language,  which  differs  from  RIPL  in 
that  attribute  values  are  replaced  with  a value  that  indicates  how  many 
instances  are  in  the  qualifier.  Thus,  the  predicate  GEOPOLITICAL-NAME” 
'CRETE'  in  RIPL  appears  as  GE0P0LITICAL-NAME=1  in  RIAL,  and  SENSITIVITY” 

( ' . 1 ' , ' . 2 ' ) appears  as  SENSITIVITY=2 . 

A QDD  is  defined  over  the  Information  Structure  including  populations, 
and  population  distributions  that  are  defaulted  to  uniform  distribution 
for  the  analysis.  In  processing  this  query,  the  search-path  selection 
algorithm  enumerated  249  different  access  paths,  which  resulted  in  368 
alternative  access  programs  to  cover  the  query.  The  CPU  time  (excluding 
printout)  to  compile  the  programs  and  compute  the  cardinality  of  access 
paths  traversed  for  each,  to  permit  selection  of  the  optimum  path,  was 
1.9  CPU  seconds.  The  extension  to  include  nonuniform  population  distri- 
butions in  the  computations  is  expected  to  be  negligible.  Because  of  the 
decomposition  routines  used,  determination  of  which  subqueries  relate  to 
which  nodes  in  a distributed  environment  is  also  negligible. 

Generation  of  representation-dependent  access  language  (RDAL)  programs 
for  each  of  the  covers  will  add  50%  CPU  time  to  the  total,  increasing  the 
processing  of  the  sample  query  to  2.8  seconds. 

Addition  of  heuristics  to  extend  selection  optimization  is  expected 
to  increase  processing  time  but  be  partially  offset  by  restricting  the 
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number  of  covers  (from  total  enumeration)  to  be  generated.  This  includes 
analysis  of  how  the  search  paths  are  implemented  (encodings),  which  is  not 
currently  considered. 

Syntactic  translation  is  not  implemented,  but  is  estimated  to  be  on 
the  order  of  0.1  second  for  the  example.  GEUF  considerations  are  not  im- 
plemented but  are  estimated  to  be  on  the  order  of  0.2  second,  using  de- 
fault display  formats. 

The  sample  query,  which  is  relatively  complex,  is  estimated  to  require 
approximately  3.1  seconds  of  CPU  time  with  current  implementation  tech- 
niques, and  can  probably  be  reduced  considerably  with  improved  techniques. 

A simple  query — over  a single  relation — should  require  less  than  0.5  sec- 
ond, based  on  similar  projections. 

The  preceding  projections  are  for  the  case  of  an  ad  hoc  query.  Of 
course,  in  a stable  operational  environment,  most  user's  queries  are  either 
totally  or  partially  predefined.  For  totally  predefined  queries,  the  po- 
tential obviously  exists  for  saving  the  compiled  and  translated  program, 
reducing  the  time  to  a single  retrieval.  For  partially  predefined  queries, 
there  is  some  potential  for  saving  the  selected  search  path,  reducing  the 
time  to  compilation  and  translation. 

No  GEUF  software  exists.  Because  it  is  conceptually  a simple  table- 
driven  program,  its  projected  size  is  estimated  to  require  on  the  order 
of  15K  to  25K  bytes  (IBM  370-165) — most  of  the  program  size  being  the  pre- 
processor and  stimulus  monitor  routines,  and  in-core  buffers.  Buffer  size 
is  of  course  an  installation-peculiar  characteristic  and  could  vary  great- 
ly, depending  on  the  number  of  users  and  the  nature  of  their  queries. 

When  DD/D  implementation  is  complete,  to  allow  application  and  imple- 
mentation descriptions  to  be  stored  on  external  storage  media  (with  prop- 
er buffering  for  efficiency),  the  total  size  of  a RIPS  package  is  esti- 
mated to  be  less  than  200K  bytes  of  storages  (IBM  370-168). 

No  estimates  of  elapsed  time  for  processing  queries  in  a distributed 
environment  have  been  attempted  but,  in  the  ensemble,  the  overhead  required 
by  the  RIPS  should  be  considerably  offset  by  the  optimization  of  search 
paths,  which  would  be  a difficult  programming  task  for  any  other  technique 
used.  Actual  processing  time  at  remote  nodes  in  the  network  is  of  course 
outside  the  control  of  RIPS,  except  that  we  are  assured  that  an  efficient 
program  is  submitted  by  RIPS.  The  time  required  for  ad  hoc  or  new  queries 
in  the  RIPS  environment  would  be  less  by  orders  of  magnitude  because  the 
alternative  is  to  design  and  write  corresponding  application  programs  for 
each  node  to  be  accessed,  which  could  take  days  or  even  weeks. 

Other  Considerations 

Current  development  has  been  accomplished  by  a very  small  permanent 
staff.  To  ensure  consistency  through  the  conceptual  development,  it  is 
critical  that  the  cadre  be  maintained  through  the  first  two  years  of  de- 
velopment . 


Timely  addition  of  new  personnel  with  specific  skills  is  also  critical 
to  the  work  plan  and  is  somewhat  problematical.  To  help  alleviate  the  edu 
cation  problem  in  view  of  the  advanced  technology  being  employed,  we  have 
begun  negotiations  with  the  University  of  Colorado,  Computer  Science  De- 
partment, to  teach  a two-semester  graduate  course  introducing  the  major 
concepts . 

Estimates  and  plans  for  technology  transfer  and  training  of  using  or- 
ganizations are  not  included. 


CONCLUSIONS 

Before  presenting  our  conclusions,  we  summarize  KM  requirements  in  six 
major  functions  and  compare  the  methods  by  which  these  functions  can  be 
satisfied  by  current  programming  techniques  with  the  methods  proposed  by 
RIPS.  This  summary  discussion  provides  a framework  for  analyzing  the  prob 
lems  inherent  in  today's  practice  and  determining  the  potential  of  alter- 
native reported  approaches,  currently  in  research  and  development,  toward 
a solution. 


Summary  of  Requirements 

Most  KM  requirements  can  he  summarized  in  six  major  functions.  The 
first  is  knowledge  sharing.  The  advent  of  Generalized  Database  Manage- 
ment Systems  has  provided  a means  for  managers  to  implement  the  'data  as 
a resource'  policy.  The  'knowledge  as  a resource'  policy  goes  further, 
requiring  that  derivations  of  information  from  stored  data  and  algorithms 
also  be  shared.  To  some  degree,  the  current  practice  of  application  pro- 
gram development  can  make  knowledge  concepts  available  through  careful 
partitioning  of  requirements  and  modularization  of  programs  and  subrou- 
tines. However,  current  practice  includes  allocating  like  requirements 
to  larger  modules,  binding  individual  concepts  to  the  current  context  and 
semantics.  Thus,  when  the  organization's  information  requirements  change, 
it  is  difficult  to  extract  required  knowledge  from  existing  applications 
for  use  in  the  new  context.  The  difficulty  arises  partially  from  the  lack 
of  visibility  of  the  concept,  usually  being  available  only  in  program  doc- 
umentation as  a narrative  description  of  the  original  purpose  and  imple- 
mentation. Even  when  we  can  recognize  the  block  of  code  that  implements 
the  concept,  we  must  program  a unique  linkage  to  use  the  code  in  the  new 
context,  or  repeat  it  in  a new  application  program. 

The  RIPS  solution  recognizes  knowledge  concepts  as  derivations  over 
the  relations  that  describe  stored  data,  algorithms,  and  other  derivations 
The  concepts,  either  in  the  context  of  some  stored  data  or  independent  of 
data,  thus  become  represented  by  additional  relations  or  attributes  of 
existing  relations,  and  are  thereafter  available  in  any  RIPL  queries.  The 
concepts  are  visible  and  users  need  not  program  unique  linkages  for  each 


The  second  major  function  is  metadata  management.  The  KM  concept  rec- 
ognizes that  what  may  be  specif ications  to  an  application  is  important  in- 
formation to  managers,  and  must  therefore  be  accessible  just  as  any  other 
information.  In  the  current  practice  of  developing  application  programs, 
requirements  of  the  application,  including  the  user's  profiles,  and  imple- 
mentation details  are  available  only  in  the  program  documentation  in  nar- 
rative descriptions.  The  KM  concept  recognizes  that  user  profiles  con- 
stitute the  organizational  data  flow,  which  is  valuable  information  to  the 
EA;  that  implementation  details  are  necessary  information  to  the  DBAs;  and 
that  the  derivations  and  productions  are  knowledge  that  is  subject  to 
sharing. 

The  RIPS  solution  is  to  include  the  requirements,  implementation  de- 
tails, and  derivations  in  the  DD/D,  and  to  include  the  contents  of  the 
DD/D  as  relations  in  the  information  structure.  Thus,  the  DD/D  becomes 
a natural  part  of  the  database,  and  access  to  its  information  is  avail- 
able with  all  the  capabilities  provided  for  other  data  sources. 

The  third  major  function  is  information  integrity.  The  KM  concept 
recognizes  that  information  resources  of  an  organization  must  be  pro- 
tected just  as  physical  resources  are.  The  entity  name  or  value  of  in- 
dividual data  items  being  added  or  changed  in  a database  is  not  inde- 
pendent of  other  data.  While  existing  DBMSs  can  constrain  individual 
values  to  a range,  to  specify  the  dependence  on  other  data  requires  that 
application  programs  be  developed  to  ensure  the  integrity  In  view  of  the 
dependence.  Similarly,  access  to  an  individual  data  item  cannot  always 
be  restricted  in  itself,  but  the  restriction  often  depends  on  other 
associations.  Again,  application  urograms  must  be  developed  to  constrain 
access  to  authorized  users  only.  ,n  developing  application  programs, 
integrity  requirements  play  an  important  role  in  the  partitioning  and 
allocation  of  functions  to  program  modules.  Because  resulting  modules 
include  both  knowledge  concepts  and  their  integrity  and  authorization 
constraints,  any  change  to  either  may  require  the  module  to  be  divided 
into  new  modules  in  which  the  knowledge  concept  and  its  constraints  are 
properly  aligned,  along  with  the  corresponding  linkage  to  other  modules. 

In  RIPS,  the  integrity  and  authorization  constraints  are  defined  at 
the  information  structure  level,  and  any  derivations  over  constrained 
data  or  processes  are  automatically  constrained  accordingly,  processed 
by  the  GEUF.  If  the  derivation  changes  to  include  more  or  less  of  the 
existing  information  structure,  applicable  constraints  for  the  corres- 
ponding associations  are  automatically  applied.  Similarly,  if  constraints 
change,  existing  derivations  are  automatically  constrained  accordingly. 

In  neither  case  are  the  user's  queries  or  interfaces  affected. 

The  fourth  function  is  distributed  access,  arising  from  the  realiza- 
tion that  distributed  information  systems  exist  today  and  will  continue 
to  be  required  in  the  foreseeable  future  (e.g.,  DoD's  Delegated  Produc- 
tion Policy).  Therefore,  the  requirement  is  to  provide  access  to  the 
distributed  resources  as  though  there  were  a single  homogeneous  imple- 
mentation. Today's  capabilities  for  accessing  data  in  this  environment 
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are  limited  to  writing  programs  in  the  languages  of  various  remote  systems 
that  contain  the  needed  data  or  processes  and  writing  another  program  in 
the  language  of  the  host  node  to  accept  the  user's  input  and  compile  the 
retruned  data  in  the  required  format. 

The  RIPS  approach  allows  specification  of  what  data  or  processes  are 
required  in  a single  implementation-independent  language,  and  the  neces- 
sary programs  for  accessing  the  remote  nodes  are  automatically  generated 
by  the  QC/T,  based  on  DIAM  descriptions  of  the  various  implementations, 
and  corresponding  syntactic  translators. 

Returned  data  are  automatically  compiled  in  the  response  specified  by 
the  user's  query.  Any  special  display  formats  or  user  interfaces  are  de- 
clared (again  using  RIPL)  at  the  source  node  for  the  query  and  automati- 
cally processed  by  the  GEUF  and  QC/T.  Any  changes  to  the  distributed  im- 
plementations or  user  interfaces  are  recorded  in  DIAM  terms  in  the  DD/D 
without  affecting  the  user's  query. 

The  fifth  function  is  implementation  flexibility.  The  KM  concept 
recognizes  that,  in  the  foreseeable  future,  a single  implementation  tech- 
nique cannot  satisfy  all  performance  requirements,  and  that  existing  tech- 
niques that  have  evolved  from  necessity  must  continue  to  be  used.  User's 
informations  needs  exist  independent  of  the  implementation.  The  current 
practice  of  stating  user's  queries  by  application  programs  that  are  bound 
to  the  current  implementation  has  proved  to  be  costly  when  a change  is 
required  to  either  the  internal  or  external  implementation. 

To  provide  the  flexibility  required  to  allow  migration  of  implemen- 
tations to  take  advantage  of  new  hardware  or  more  efficient  techniques 
and  to  react  to  ever  changing  user's  needs,  the  information  content  of 
queries  must  be  separated  from  implementation  details.  RIPS  provides 
the  separation  through  the  use  of  RIPL  to  state  the  information  needs. 
Details  of  implementations  are  provided  by  DIAM  descriptions,  and  the 
combined  effects  of  the  GEUF  and  QC/T  automatically  map  the  queries  to 
the  internal  and  external  representations.  Thus,  changes  to  either  have 
no  effect  on  the  semantics  of  user's  needs. 

The  sixth  function  is  implementation  aids.  In  view  of  the  preceding, 
the  choice  of  implementation  techniques  is  an  important  consideration  in 
the  cost  and  performance  of  information  systems.  The  statement  of  sys- 
tem requirements,  including  organizational  data  flows,  data  populations, 
etc,  must  be  sufficient  to  enable  designers  to  make  rationale  decisions. 
Because  there  is  no  formal  methodology  generally  available  that  incor- 
porates all  decision  parameters,  and  because  the  analyses  are  complex  and 
multidimensional,  current  methods  are  largely  ad  hoc,  developed  just  be- 
fore their  need. 

RIPS  recognizes  the  difference  between  issues  that  are  quantifiable 
and  those  that  are  not.  Quantifiable  issues  that  include  the  statics  and 
dynamics  of  information  flow  are  formally  described  by  QDD  and  evaluated 
using  discrete  eve  t simulation,  using  the  same  formalisms  used  for  the 
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operational  QC/T.  Unquantif iable  issues  are  treated  as  constraints  of 
alternative  implementation  techniques,  including  the  use  of  commercial 
products,  and  serve  to  either  eliminate  candidates  that  cannot  satisfy 
them  or  add  in  the  cost  to  provide  them. 

RIPS  goes  further,  recognizing  that  the  requirements  are  not  a one- 
time phenomenon,  but  are  constantly  changing  to  keep  in  step  with  real- 
world  changes  both  inside  and  outside  the  control  of  the  using  organi- 
zation. Thus,  quantifiable  requirements  must  be  maintained,  and  RIPS 
provides  the  means  by  describing  them  as  relations  that  are  part  of  the 
database  and  allowing  whatever  degree  of  concurrency  is  called  for  in 
the  environment. 


Summary  of  Today's  Problems 

Major  problems  inherent  in  current  practices  of  application  program/ 
DBMS  development  in  satisfying  the  preceding  requirements  can  be  summar- 
ized by  two  characteristics.  First  is  the  time  (and  consequently , cost) 
required  to  automate  manual  systems — time  during  which  the  original  re- 
quirements may  change — and  to  implement  changes  in  reaction  to  inevitable 
real-world  changes.  This  time  is  attributable  to  the  practice  of  incor- 
porating multiple  concepts  in  application  programs,  bound  together  by  the 
various  implementation  techniques  employed. 

During  initial  development,  individual  functional  requirements  are 
identified,  then  allocated  to  program  modules  on  the  basis  of  common  re- 
quirements. The  commonality  among  individual  functional  requirements  may 
include  use  of  derivations,  formats,  users,  data,  source,  response  time, 
or  other  characteristics,  and  determination  of  allocations  is  a combini- 
toric  problem.  Missing,  changing,  and  misunderstood  requirements  con- 
tribute to  the  problem,  requiring  reallocation  during  development,  in- 
validating completed  code,  and  requiring  restructuring  of  the  database, 
which  propagates  throughout  the  design. 

During  operation,  a change  to  a single  requirement  cannot  be  made 
without  analyzing  its  effects  on  associated  encodings  of  other  require- 
ments that  must  not  change. 

The  availability  of  automated  information  is  controlled  by  this  pro- 
cess. Even  when  the  data  we  require  are  in  the  database,  the  time  neces- 
sary to  develop  application  programs  to  state  the  query  determines  the 
time  in  which  we  can  retrieve  the  information.  Once  the  program  is  de- 
veloped, database  implementation  controls  the  response  time.  If  the  re- 
sponse time  does  not  meet  requirements,  the  Implementation  must  be  chang- 
ed, but  because  the  queries  are  bound  to  the  implementation,  those  pro- 
grams must  also  be  changed — controlling  the  time  in  which  the  new  infor- 
mation becomes  available. 

The  second  major  problem  in  today's  practice  is  manageability.  Just 
as  information  is  the  basis  for  managing  an  organization,  specifications 
or  metadata  are  the  basis  for  managing  the  information;  and  Just  as  the 
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availability  of  information  affects  the  management  of  an  organization,  the 
availability  of  metadata  affects  the  management  of  the  information. 

In  today’s  environment,  we  have  recognized  that  automation  can  increase 
the  availability  of  information  and  thereby  improve  the  organization's  man- 
agement potential.  At  the  same  time,  we  have  left  the  information  neces- 
sary to  manage  the  information  to  be  manually  implemented.  The  irony  is 
that  we  are  replacing  archaic  information  systems  with  modern,  sophisti- 
cated systems,  but  attempting  to  manage  them  by  archaic  information  sys- 
tems. As  the  size  of  information  systems  increases,  so  will  the  amount 
of  metadata,  magnifying  information  management's  dilemma. 

The  Martin  Marietta  Database  Research  Project  has  concentrated  on  pre- 
cisely these  two  problems,  which  can  be  reduced  to  a single  problem  of  im- 
proving information  availability  when  we  include  all  information.  The 
time  required  to  state  a query  is  attacked  by  providing  a representation- 
independent  nonprocedural  programming  language  and  the  QC/T  to  relieve 
users  of  having  to  know  the  details  of  implementation.  The  time  to  ob- 
tain a response  is  attacked  by  optimizing  query  processing  for  the  imple- 
mentations that  are  provided.  The  time  required  to  incorporate  changes 
is  attacked  by  separating  application  functions  and  generalizing  their 
processing  by  the  GEUF  and  QC/T.  The  availability  of  metadata  is  pro- 
vided by  including  specifications  as  relations  in  the  DD/D  and  including 
the  DD/D  as  a natural  part  of  the  database. 

Industry's  Solution  to  Today's  Problems 

Industry  is  taking  three  different  approaches  to  solving  today's  prob- 
lems. The  hardware/firmware  approach  is  directed  primarily  toward  pro- 
viding faster  response  to  a stated  query.  The  major  impetus  is  toward 
content-addressable  memory  (CAM),  necessary  for  associative  memories /pro- 
cessors (e.g.,  References  25,  26,  27,  28)  and  the  database  machine.29 
Its  potential  is  to  eliminate  database  design  problems  by  providing  a 
single  method  of  implementation  for  all  data.  This  would  make  the  need 
for  a single  conceptual  model  obvious  and  the  mapping  to  the  data  con- 
sistent, thus  allowing  for  a representation-independent  query  language. 

25.  S.  S.  Yau  and  H.  S.  Fung:  "Associative  Processor  Architecture — A 
Survey,"  ACM  Computing  Surveys,  Vol  9,  No.  1,  March  1977,  pp  3-28. 

26.  G.  A.  Anderson  and  R.  Y.  Kain:  "A  Content-Addressed  Memory  Design  for 
Data  Base  Applications,"  Proa.  1976  International  Conference  on  Parallel 
Processing,  IEEE,  1976,  pp  191-195. 

27.  D.  L.  Slotnick:  "Logic  per  Track  Devices,"  Advances  in  Computers,  Vol 
10,  Academic  Press,  New  York,  1970,  pp  291-296. 

28.  C.  Y,  Lee  and  M.  C.  Pauli:  A Content  Addressable  Distributed  Logic 
Memory  with  Applications  to  Information  Retrieval,"  Proc  IEEE,  76, 

June  1963,  pp  924-932. 

29.  David  K.  Hsiao  and  Stuart  E.  Madnick:  "Database  Machine  Architecture 
in  the  Context  of  Information  Technology  Evolution,"  Proc.  Third  Inter- 
national Conference  on  Very  Large  Databases,  Tokyo,  Japan,  October  1977. 
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However,  requirements  for  knowledge  sharing,  metadata  management,  infor- 
mation integrity,  distributed  access  (to  existing  systems),  and  user  in- 
terface flexibility  would  still  have  to  be  developed  to  satisfy  the  KH 
concept . 

The  software  approach  includes  Relational  Database  Systems  being  de- 
veloped by  IBM  (System  R)20  and  the  University  of  California  (INGRES).13 
These  systems  implement  the  conceptual  data-model  approach  and  provide  a 
representation-independent  query  language.  The  also  separate  the  appli- 
cation functions  of  integrity  and  authorization,  independent  of  queries, 
and  generalize  their  processing.  Mapping  of  queries  to  the  internal  stor- 
age is  provided  by  conventional  DBMS  techniques,  with  a range  of  imple- 
mentation alternatives  available  to  the  database  designer.  They  both  re- 
quire application  programs  to  be  written  in  a general-purpose  programming 
language  to  direct  the  processing  of  derived  information  and  external  in- 
terfaces. Thus,  when  they  become  commercially  available,  they  will  not 
satisfy  the  KM  requirements  of  knowledge  sharing,  metadata  management, 
distributed  access,  user-interface  flexibility  and  implementation  aids. 

The  third  approach  is  RIPS.  It  provides  the  functions  of  CAM  through 
software  via  the  QC/T.  RIPS  does  not  offer  new  computer  systems  tech- 
nology with  respect  to  hardware  or  software.  What  is  needed  exists  today. 
Refinement  of  new  concepts  of  information  processing,  to  replace  those 
born  at  the  inception  of  computers  and  strengthened  through  years  of  prac- 
tice, and  their  implementation,  is  the  new  technology  offered — satisfy- 
ing KM  requirements. 

The  reported  approaches  and  progress  toward  satisfying  today's  prob- 
lems make  it  unlikely  that  a satisfactory  system  for  realizing  KM  con- 
cepts will  be  offered  in  the  next  decade.  At  the  current  level  of  fund- 
ing, this  includes  RIPS. 


Conclusion 

KM  concepts  concentrate  on  the  use  of  information  in  government  or- 
ganizations, recognizing  the  functional  and  organizational  requirements 
to  permit  more  effective  management  of  this  essential  resource.  Martin 
Marietta's  Database  Research  Project  is  concentrating  on  the  technical 
means  of  improving  total  information  availability,  largely  ignoring  or- 
ganizational effects  in  any  particular  context.  The  high  degree  of  cor- 
respondence shown  between  KM  functional  requirements  and  RIPS  capabili- 
ties indicates  that  the  RIPS  will  provide  a powerful  test  bed  for  vali- 
dating KM's  organizational  and  management  concepts. 

20.  M.  Stonebraker:  "Implementation  of  Integrity  Constraints  and  Views  by 
Query  Modification,"  Proa.  ACM  SIGMOD  International  Conference  on 
Management  of  Data,  San  Jose,  California,  May  1976,  pp  65-78, 

(ed.  W.  F.  King). 

13.  M.  Stonebraker,  E.  Wong,  and  P.  Kreps : "The  Design  and  Implementation 
of  INGRES,"  ACM  Transactions  on  Database  Systems,  Vol  1,  No.  3,  Sep- 
tember 1976,  pp  189-222. 
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GLOSSARY  OF  TERMS 


associated  qualification  rate 
computer-aided  instruction 
content-addressable  memory 
computer  processing  unit 
database  administrator 
database  management  system 
database  task  group 
data  dictionary/directory 
data  description  language 
data-independent  accessing  model 
data  manipulation  language 
data  management  system 
enterprise  administrator 
end-user  facility 
factual  knowledge  subsystem 
fuzzy-set-theoretic  data  structure 
generalized  database  management  system 
generalized  database  management  system 
generalized  end-user  facility 
identification 
information  structure 
judgement  support  subsystem 
knowledge-based  personal  assistant 
knowledge  management 
knowledge  resource  center 
management  information  system 
math  model  simulator 

Program-Assisted  Console  Evaluation  and  Review 

procedural  knowledge  subsystem 

query  compiler/translator 

quantitative  data  description 

requirement 


DBMS 


DBTG 


DIAM 


FSTDS 


GDBMS 


GDMS 


GEUF 


KBPA 


PACER 


representation-dependent  accessing  language 

representation-dependent  language 

representation-independent  acessing  language 

representation-independent  programming  language 

representation-independent  programming  system 

rough  order  of  magnitude 

real-time  simulator 

translation  and  control  subsystem 

third  normal  form 

time-sharing  option 


RDAL 


RIPL 


MISSION 

of 

Rome  Air  Development  Center 


and  Intelligence . The  principal  technical  mission  areas 
are  cosmatni cations , electromagnetic  guidance  and  control, 
surveillance  of  ground  and  aerospace  objects,  intelligence 
data  collection  and  handling,  information  system  teclnology 


data  collection  and  handling,  - 

ionospheric  propagation,  solid  state  sciences,  microeavt 
physics  and  electronic  reliability , maintainability  and 
compatibility . 
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